GB2623037A - Early invocation for contextual data processing - Google Patents

Early invocation for contextual data processing Download PDF

Info

Publication number
GB2623037A
GB2623037A GB2400873.2A GB202400873A GB2623037A GB 2623037 A GB2623037 A GB 2623037A GB 202400873 A GB202400873 A GB 202400873A GB 2623037 A GB2623037 A GB 2623037A
Authority
GB
United Kingdom
Prior art keywords
data item
contextual data
natural language
generate
language processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2400873.2A
Other versions
GB202400873D0 (en
Inventor
Jacob Ponnu
Kalman Adam
Luo Ruiqi
Zhao Jingqian
Chen Xi
Zhu Zhu Yuqiang
Maddipati Krupal
Yan Wenbo
Yang Liu
Xie Meng
Devillaine Adriano
Kumar Kollu Uday
Ramachandra Prathap
Alizerine Dzialo Charlotte
Alnuaimat Mohammad
P Vinodkrishnan Nalledath
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Amazon Technologies Inc
Original Assignee
Amazon Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/304,720 external-priority patent/US20220415311A1/en
Priority claimed from US17/304,714 external-priority patent/US11657805B2/en
Application filed by Amazon Technologies Inc filed Critical Amazon Technologies Inc
Publication of GB202400873D0 publication Critical patent/GB202400873D0/en
Publication of GB2623037A publication Critical patent/GB2623037A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Abstract

A speech processing system uses contextual data to determine the specific domains, subdomains, and applications appropriate for taking action in response to spoken commands and other utterances. The system can use signals and other contextual data associated with an utterance, such as location signals, content catalog data, data regarding historical usage patterns, data regarding content visually presented on a display screen of a computing device when an utterance was made, other data, or some combination thereof.

Claims (15)

1. A computer-implemented method comprising: as performed by a computing system comprising one or more computer processors configured to execute specific instructions, executing a set of natural language processing actions on natural language input to generate intent data, wherein the natural language input represents a user utterance, and wherein the intent data comprises a semantic representation of the user utterance; determining, at an integration point during the set of natural language processing actions, to generate a contextual data item using an initial data item associated with at least one of the set of natural language processing actions; generating the contextual data item based at least partly on the initial data item concurrently with performance of at least a portion of the set of natural language processing actions; and providing access to the contextual data item, wherein the contextual data item is accessed during performance of at least one of: the set of natural language processing actions, or a set of response determination actions.
2. The computer-implemented method of claim 1 , further comprising determining, using a registry, a contextual data action to be performed to generate the contextual data item, wherein the registry specifies the contextual data action and one or more initial data items to be used to generate the contextual data item.
3. The computer-implemented method of claim 2, wherein determining the contextual data action comprises determining one of: evaluation of the one or more initial data items using a model to generate a prediction regarding an aspect of the user utterance; performance of a calculation using the one or more initial data items to precompute a value; or acquisition of a value from a data store using the one or more initial data items.
4. The computer-implemented method of claim 1, wherein generating the contextual data item composes generating data representing at least one of: a location of a voice-enabled device from which the natural language input is received, an identifier of the voice-enabled device, a content catalog associated with a user profile, historical interactions associated with the user profile, a prediction regarding an aspect of the user utterance, or a comparison of a plurality of interpretations of the user utterance.
5. The computer- implemented method of claim 1, further comprising: determining, at a second integration point during the set of natural language processing actions, to generate a second contextual data item using a second initial data item; and generating the second contextual data item based at least partly on the second initial data item concurrently with performance of at least a second portion of the set of natural language processing actions.
6. The computer-implemented method of claim 1, further comprising: in response to generating the contextual data item, determining to generate a second contextual data item using the contextual data item; and generating the second contextual data item based at least partly on the contextual data item concurrently with performance of at least a second portion of the set of natural language processing actions.
7. The computer-implemented method of claim 1 , wherein providing access to the contextual data item comprises storing the contextual data item in a data store accessible during performance of the at least one of the set of natural language processing actions, or the set of response determination actions.
8. The computer-implemented method of claim 1, wherein executing the set of natural language processing actions comprises: generating text data using audio data and an automatic speech recognition (â ASRâ ) process, wherein the text data represents at least a portion of the user utterance; and generating intent data using the text data and a natural language understanding (â NLUâ ) process, wherein the intent data comprises a semantic representation of the user utterance.
9. A system comprising: computer-readable memory storing executable instructions: and one or more processors in communication with the computer-readable memory and configured by the executable instructions to at least: execute a set of natural language processing actions on natural language input to generate intent data, wherein the natural language input represents a user utterance, and wherein the intent data comprises a semantic representation of the user utterance; determine, at an integration point during the set of natural language proeessing actions, to generate a contextual data item using an initial data item associated with at least one of the set of natural language processing actions; generate the contextual data item based at least partly on the initial data item concurrently with performance of at least a portion of the set of natural language processing actions; and provide access to the contextual data item, wherein the contextual data item is accessed during performance of at least one of: the set of natural language processing actions, or a set of response determination actions.
10. The system of claim 9, wherein the one or more processors are configured by further executable instructions to determine, using a registry, a contextual data action to be performed to generate the contextual data item, wherein the registry specifies the contextual data action and one or more initial data items to be used to generate the contextual data item.
11. The system of claim 9, wherein the one or more processors are configured by further executable instructions to at least: determine, at a second integration point during the set of natural language processing actions, to generate a second contextual data item using a second initial data item; and generate the second contextual data item based at least partly on the second initial data item concurrently with performance of at least a second portion of the set of natural language processing actions.
12. The system of claim 9, wherein the one or more processors are configured by further executable instructions to at least: response to generating the contextual data item, determine to generate a second contextual data item using the contextual data item; and generate the second contextual data item based at least partly on the contextual data item concurrently with performance of at least a second portion of the set of natural language processing actions.
13. The system of claim 9, wherein the one or more processors configured to: generate the contextual data item based at least partly on the initial data item are configured by further executable instructions to evaluate the initial data item using a model to generate a prediction regarding an aspect of the user utterance; and generate a response to the user utterance based at least partly on the prediction, wherein the prediction relates to a likely status of the user utterance as one of an initial user utterance of a multi-turn dialog or a follow-up user utterance of a multi-turn dialog.
14. The system of claim 9, wherein the one or more processors configured to provide access to the contextual data item are configured by further executable instructions to respond to a request for the contextual data item during performance of the at least one of the set of natural language processing actions, or the set of response determination actions.
15. The system of claim 9, wherein the one or more processors are configured by further executable instructions to: determine an application to which intent data comprising a semantic representation of the user utterance is to be routed for generation of a response to the user utterance; and generate a response to the user utterance using the application and the intent data.
GB2400873.2A 2021-06-24 2022-06-17 Early invocation for contextual data processing Pending GB2623037A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US17/304,720 US20220415311A1 (en) 2021-06-24 2021-06-24 Early invocation for contextual data processing
US17/304,714 US11657805B2 (en) 2021-06-24 2021-06-24 Dynamic context-based routing of speech processing
PCT/US2022/034014 WO2022271555A1 (en) 2021-06-24 2022-06-17 Early invocation for contextual data processing

Publications (2)

Publication Number Publication Date
GB202400873D0 GB202400873D0 (en) 2024-03-06
GB2623037A true GB2623037A (en) 2024-04-03

Family

ID=82557940

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2400873.2A Pending GB2623037A (en) 2021-06-24 2022-06-17 Early invocation for contextual data processing

Country Status (3)

Country Link
DE (1) DE112022003216T5 (en)
GB (1) GB2623037A (en)
WO (1) WO2022271555A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030112277A1 (en) * 2001-12-14 2003-06-19 Koninklijke Philips Electronics N.V. Input of data using a combination of data input systems
US20190324780A1 (en) * 2018-04-20 2019-10-24 Facebook, Inc. Contextual Auto-Completion for Assistant Systems

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10515625B1 (en) 2017-08-31 2019-12-24 Amazon Technologies, Inc. Multi-modal natural language processing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030112277A1 (en) * 2001-12-14 2003-06-19 Koninklijke Philips Electronics N.V. Input of data using a combination of data input systems
US20190324780A1 (en) * 2018-04-20 2019-10-24 Facebook, Inc. Contextual Auto-Completion for Assistant Systems

Also Published As

Publication number Publication date
WO2022271555A1 (en) 2022-12-29
DE112022003216T5 (en) 2024-04-04
GB202400873D0 (en) 2024-03-06

Similar Documents

Publication Publication Date Title
US11887582B2 (en) Training and testing utterance-based frameworks
CN111033492B (en) Providing command bundle suggestions for automated assistants
CN111859994B (en) Machine translation model acquisition and text translation method, device and storage medium
US8239201B2 (en) System and method for audibly presenting selected text
US20200234695A1 (en) Determining phonetic relationships
KR102625184B1 (en) Speech synthesis training to create unique speech sounds
US20180253420A1 (en) Output sentence generation apparatus, output sentence generation method, and output sentence generation program
US20190205301A1 (en) Combo of Language Understanding and Infomation Retrieval
GB2610709A (en) Synthetic speech processing
US20240055002A1 (en) Detecting near matches to a hotword or phrase
US20230169980A1 (en) Detecting and handling failures in other assistants
JP6712754B2 (en) Discourse function estimating device and computer program therefor
GB2623037A (en) Early invocation for contextual data processing
JP2012108429A (en) Voice selection device, utterance selection device, voice selection system, method for selecting voice, and voice selection program
US20230015112A1 (en) Method and apparatus for processing speech, electronic device and storage medium
US11775510B1 (en) System and method for modeling a search query
US11783828B2 (en) Combining responses from multiple automated assistants
US20230298580A1 (en) Emotionally Intelligent Responses to Information Seeking Questions
US20240013782A1 (en) History-Based ASR Mistake Corrections
US11935533B1 (en) Content-related actions based on context
US11756533B2 (en) Hot-word free pre-emption of automated assistant response presentation
US20230113883A1 (en) Digital Signal Processor-Based Continued Conversation
US20230267949A1 (en) Streaming Vocoder
Faruqui et al. Improved Contextual Grounding by Combining Multiple Speech Transcription Hypotheses
CN114302028A (en) Word extraction method, word extraction device, electronic equipment, storage medium and program product