WO2024137122A1 - Chaînage de modèles d'apprentissage automatique à étages multiples - Google Patents

Chaînage de modèles d'apprentissage automatique à étages multiples Download PDF

Info

Publication number
WO2024137122A1
WO2024137122A1 PCT/US2023/081254 US2023081254W WO2024137122A1 WO 2024137122 A1 WO2024137122 A1 WO 2024137122A1 US 2023081254 W US2023081254 W US 2023081254W WO 2024137122 A1 WO2024137122 A1 WO 2024137122A1
Authority
WO
WIPO (PCT)
Prior art keywords
skill
model
output
prompt
chain
Prior art date
Application number
PCT/US2023/081254
Other languages
English (en)
Inventor
Samuel Edward SCHILLACE
Umesh Madan
Devis LUCATO
Original Assignee
Microsoft Technology Licensing, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/122,575 external-priority patent/US20240202582A1/en
Application filed by Microsoft Technology Licensing, Llc filed Critical Microsoft Technology Licensing, Llc
Publication of WO2024137122A1 publication Critical patent/WO2024137122A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Definitions

  • Singular evaluations with a machine learning model may have limited utility, especially for more complex tasks and in instances where the user is unfamiliar with the machine learning model and/or the task at hand. Accordingly, such evaluations may result in a diminished user experience, increased user frustration, and/or wasted computational resources, among other detriments.
  • aspects of the present application relate to multi-stage machine learning model chaining, where a skill chain comprised of a set of ML model evaluations with which to process an input is generated and used to ultimately produce a model output accordingly.
  • Each ML model evaluation corresponds to a “model skill” of the skill chain.
  • a model skill has an associated prompt template, which is used to generate a prompt (e.g., including input and/or context) that is processed using a corresponding ML model to generate model output accordingly.
  • a prompt template e.g., including input and/or context
  • an ML model associated with a model skill need not have an associated prompt template, as may be the case when prompting is not used by the ML model when processing input to generate model output.
  • Intermediate output that is generated by a first ML evaluation for a first model skill of the skill chain may subsequently be processed as input to a second ML evaluation for a second model skill of the skill chain, thereby ultimately generating model output for the given input.
  • a skill chain can include any number skills according to any of a variety of structures and need not be evaluations using the same ML model.
  • Figure 1 illustrates an overview of an example system in which multi-stage machine learning model chaining may be used according to aspects of the present disclosure.
  • Figure 2 illustrates an overview of an example conceptual diagram for processing a user input to generate model output using chained machine learning models according to according to aspects described herein.
  • Figure 3 illustrates an overview of an example method for processing a user input to generate model output according to aspects described herein.
  • Figure 4 illustrates an overview of an example method for processing user input according to a prompt using a generative ML model according to aspects described herein.
  • Figures 5A and 5B illustrate overviews of an example generative machine learning model that may be used according to aspects described herein.
  • Figure 6 is a block diagram illustrating example physical components of a computing device with which aspects of the disclosure may be practiced.
  • FIG. 7 is a simplified block diagram of a computing device with which aspects of the present disclosure may be practiced.
  • FIG. 8 is a simplified block diagram of a distributed computing system in which aspects of the present disclosure may be practiced.
  • a machine learning (ML) model produces model output based on an input (e.g., as may be received from a user). For example, natural language input from a user is processed using a generative ML model to produce model output for the natural language input accordingly.
  • a singular evaluation may result in reduced utility, especially in instances where a user is inexperienced/unfamiliar with the ML model (e.g., such that the user may provide input that results in limited utilization of the ML model and/or causes the ML model to behave unexpectedly).
  • use of an ML model through a singular evaluation may limit the tasks for which the model may be used, among other detriments.
  • aspects of the present application relate to multi-stage machine learning model chaining, where an input is processed using a set of ML model evaluations (e.g., that are chained together in a “skill chain”) to ultimately produce a model output for a given input.
  • Each ML model evaluation corresponds to a “model skill” of the skill chain.
  • a model skill has an associated prompt template, which is used to generate a prompt (e.g., including input and/or context) that is processed using a corresponding ML model to generate model output accordingly.
  • a prompt template e.g., including input and/or context
  • an ML model associated with a model skill need not have an associated prompt template, as may be the case when prompting is not used by the ML model when processing input to generate model output.
  • Intermediate output generated by a first skill of a skill chain may subsequently be processed by a second skill to generate subsequent output accordingly.
  • a skill chain can include any number of skills and it will be appreciated that model skills need not be associated with the same ML model.
  • a generative model may generate natural language output, while a recognition model (or any of a variety of other types of ML models) may process intermediate output from the generative model to produce model output accordingly.
  • Output of the recognition model may be provided as ultimate model output or may be intermediate output that is processed using another skill of the skill chain.
  • a generative model (also generally referred to herein as a type of ML model) used according to aspects described herein may generate any of a variety of output types (and may thus be a multimodal generative model, in some examples) and may be a generative transformer model and/or a large language model (LLM), a generative image model, in some examples.
  • Example ML models include, but are not limited to, Megatron-Turing Natural Language Generation model (MT-NLG), Generative Pre-trained Transformer 3 (GPT-3), Generative Pre-trained Transformer 4 (GPT-4), BigScience BLOOM (Large Open-science Open-access Multilingual Language Model), DALL-E, DALL-E 2, Stable Diffusion, or Jukebox.
  • Intermediate output includes, but is not limited to, natural language output, image data output, video data output, programmatic output, and/or binary output. It will therefore be appreciated that intermediate output may have any number of output “streams” (e.g., each having an associated content type).
  • the intermediate output comprises structured output, which may include one or more tags, key/value pairs, and/or metadata, among other examples.
  • structured output may include one or more tags, key/value pairs, and/or metadata, among other examples.
  • a stream may be denoted according to an associated tag within such structured output.
  • a prompt template of a skill defines or otherwise includes an indication relating to such structured output, thereby causing the generative ML model associated with the skill to produce structured output accordingly.
  • use of structured output may increase the degree to which model output is deterministic and may therefore improve reliability when chaining multiple ML model evaluations together according to aspects described herein.
  • intermediate output may be similar to ultimate model output that may otherwise be provided to a user or for further processing by an application, among other examples.
  • skills may be chained together according to any of a variety of techniques.
  • a skill chain may include one or more sequential skills, a hierarchical set of skills, a set of parallel skills, and/or a skill that is dependent on or otherwise processes output from two or more skills, among other examples.
  • a skill chain is a graph or may be arranged according to any of a variety of other data structures.
  • a skill chain may include any of a variety of other types of skills.
  • one or more model skills may be chained together with a programmatic skill.
  • a programmatic skill may read the content of a file, obtain data from a data source and/or from a user, send an electronic message containing model output, create a file containing model output, and/or execute programmatic output that is generated by a model skill.
  • a skill library stores model and/or programmatic skills, from which a set of skills may be identified and used to generate a skill chain accordingly (e.g., thereby performing a set of associated ML model evaluations).
  • a chain orchestrator may extract an intent from a given input (e.g., from a user and/or an application), which may be mapped to one or more skills of the skill library, thereby generating a chain model and/or programmatic skills with which to process the input.
  • new skills are added to the skill library (e.g., by an application or by a user or developer), such that they may be dynamically identified and used as part of a skill chain with which to process a given input according to aspects described herein.
  • a skill may include or otherwise be associated with a prompt template.
  • One or more fields, regions, and/or other parts of the prompt template may be populated (e.g., with input and/or context), thereby generating a prompt to be processed by an ML model according to aspects described herein.
  • the prompt is used to prime the ML model, thereby inducing the model to generate output corresponding to the prompt template.
  • a prompt template may include any of a variety of data, including, but not limited to, natural language, image data, audio data, video data, and/or binary data, among other examples.
  • a skill as used herein invokes processing by an ML model (e.g., according to an associated prompt template) to process a given input (e.g., as may be received from a user or as may be intermediate output from another skill).
  • context is processed as part of the ML model evaluation.
  • an input may include an indication as to context
  • the skill may define a context that is provided to the ML model, and/or a chain orchestrator (e.g., that defines and/or manages processing of a skill chain) may determine context that is used for ML model evaluation accordingly, among other examples.
  • an associated context may be shared among or otherwise used by a plurality of model skills. For example, at least a part of the context that is used for processing associated with a first model skill (or, in other examples, a plurality of model skills) may be used by a second model skill.
  • the context is changed by a first model evaluation (e.g., of the first model skill) that occurs prior to or contemporaneously with processing by a second ML model evaluation (e.g., for the second model skill), such that the second ML model evaluation uses the updated context accordingly.
  • the skill chain itself may be managed, orchestrated, and/or derived by an ML model (e.g., by a generative ML model based on natural language input that is received from a user and/or input that is generated by or otherwise received from an application). Additionally, given different ML models may be chained together (e.g., which may each generate a different type of model output), the resulting model output may be output that would not otherwise be produced as a result of processing by a single ML model.
  • an ML model e.g., by a generative ML model based on natural language input that is received from a user and/or input that is generated by or otherwise received from an application.
  • FIG. 1 illustrates an overview of an example system 100 in which multi-stage machine learning model chaining may be used according to aspects of the present disclosure.
  • system 100 includes machine learning service 102, computing device 104, and network 106.
  • machine learning service 102 and computing device 104 communicate via network 106, which may comprise a local area network, a wireless network, or the Internet, or any combination thereof, among other examples.
  • machine learning service 102 includes chain orchestrator 108, model repository 110, skill library 112, and semantic memory store 114.
  • machine learning service 102 receives a request from computing device 104 (e.g., from multi-stage machine learning framework 118) to generate model output, which may be generated using a skill chain as described herein.
  • the request may include an input (e.g., as may be a user input that was received from a user at computing device 104 and/or that was generated by application 116).
  • chain orchestrator 108 may identify one or more ML models from model repository 110 and process the input accordingly.
  • chain orchestrator 108 processes the request to generate a skill chain with which to generate the model output (e.g., using one or more models of model repository 110).
  • chain orchestrator 108 uses a generative ML model to process at least a part of the input (e.g., in conjunction with a prompt that was generated from a prompt template using the input), thereby generating a skill chain that includes one or more model skills (and, in some examples, one or more programmatic skills) accordingly.
  • Chain orchestrator 108 then processes the resulting skill chain according to aspects described herein, for example using one or more models of model repository 110, skills of skill library 112 and/or 120, and/or context from semantic memory store 114 and/or 122.
  • the request includes a prompt that is used to prime the ML model (e.g., that was generated using a prompt template for a skill from skill library 120 of computing device 104).
  • the request includes an indication of a skill in skill library 112, such that chain orchestrator 108 generates a corresponding prompt based on the skill from skill library 112 accordingly.
  • the request includes a context with which the request is to be processed (e.g., from semantic memory store 122 of computing device 104).
  • the request includes an indication of context in semantic memory store 114, such that chain orchestrator 108 obtains the context from semantic memory store 114 accordingly.
  • multi-stage machine learning framework 118 performs aspects similar to chain orchestrator 108, such that multi-stage machine learning framework 118 generates a skill chain and/or manages processing of a skill chain accordingly.
  • chain orchestrator 108 obtains additional information that is used when processing a request (e.g., as may be obtained from a remote data source or as may be requested from a user of computing device 104). For instance, chain orchestrator 108 may determine to obtain additional information for a given evaluation of a skill chain, among other examples. As an example, additional information may be obtained via a programmatic skill (e.g., as may have been included in a skill chain by chain orchestrator 108). Examples of such aspects are discussed in greater detail below with respect to methods 300 and 400 of Figures 3 and 4, respectively.
  • Model repository 110 may include any number of different ML models.
  • model repository 110 may include foundation models, language models, speech models, video models, and/or audio models.
  • a foundation model is a model that is pre-trained on broad data that can be adapted to a wide range of tasks (e.g., models capable of processing various different tasks or modalities).
  • a multimodal machine learning model of model repository 110 may have been trained using training data having a plurality of content types. Thus, given content of a first type, an ML model of model repository 110 may generate content having any of a variety of associated types.
  • model repository 110 may include a foundation model as well as a model that has been finetuned (e.g., for a specific context and/or a specific user or set of users), among other examples.
  • computing device 104 includes application 116, multistage machine learning framework 118, skill library 120, and semantic memory store 122.
  • application 116 uses multi-stage machine learning framework 118 to process user input and generate model output accordingly, which may be presented to a user of computing device 104 and/or used for subsequent processing by application 116, among other examples.
  • multi-stage machine learning framework 118 are similar to chain orchestrator 108 and are therefore not necessarily redescribed in detail.
  • multi-stage machine learning framework 118 may generate and/or manage evaluation of a skill chain according to aspects described herein.
  • multi-stage machine learning framework 118 provides an indication of user input to machine learning service 102, such that a skill chain is generated by machine learning service 102 and is received by computing device 104 in response.
  • multi-stage machine learning framework 118 manages the evaluation of the skill chain (e.g., generating subsequent requests to machine learning service 102 for constituent model skills) according to one or more associated prompt templates (e.g., as may be stored by skill library 112/120) and/or based on associated context (e.g., from semantic memory store 114/122).
  • multi-stage machine learning framework 118 requests model output from machine learning service 102 for model skills of the skill chain, while a programmatic skill of the skill chain may be processed local to (or, in other examples, remote from) computing device 104.
  • skill chain generation/orchestration and/or prompt generation may be performed client side (e.g., by multi-stage machine learning framework 118), server side (e.g., by chain orchestrator 108), or any combination thereof, among other examples.
  • client side e.g., by multi-stage machine learning framework 118
  • server side e.g., by chain orchestrator 108
  • multi-stage machine learning framework 118 may perform a first ML evaluation associated with a first model skill stored by skill library 120 of computing device 104, while a second ML evaluation is performed by machine learning service 102 based on a second model skill that is stored by skill library 112.
  • Multi-stage machine learning framework 118 may be provided as part of an operating system of computing device 104 (e.g., as a service, an application programming interface (API), and/or a framework), may be made available as a library that is included by application 116 (or may be more directly incorporated by an application), or may be provided as a standalone application, among other examples.
  • an operating system of computing device 104 e.g., as a service, an application programming interface (API), and/or a framework
  • API application programming interface
  • framework may be made available as a library that is included by application 116 (or may be more directly incorporated by an application), or may be provided as a standalone application, among other examples.
  • a user interface is provided via which a user may interact with a multi-stage machine learning framework and/or chain orchestrator.
  • machine learning service 102 may additionally, or alternatively, implement aspects similar to multi-stage machine learning framework 118, such that machine learning service 102 provides a website via which a user may interact with a console or terminal interface of the multi-stage machine learning framework accordingly.
  • the console may include a text-based user interface via which a user inputs skills (e.g., model skills and/or programmatic skills) that may be chained together.
  • skills may be chained together using a pipe (“
  • piping output of one skill e.g., a first model skill
  • another skill e.g., a second model skill
  • inputs and/or outputs of one or more skills may additionally, or alternatively, be redirected according to any of a variety of other techniques (e.g., using “ «”, and/or “»” operators).
  • computing device 104 may include a user interface that is part of an application (e.g., application 116) or a plurality of applications (e.g., as a shared framework or as functionality that is provided by an operating system of computing device 104).
  • natural language input may be provided via the user interface (e.g., as text input and/or as voice input), which may be processed according to aspects described herein and used to generate a skill chain accordingly.
  • a skill of the skill chain may interact with one or more command interfaces, each of which may be associated with an application (e.g., application 116) and/or an operating system of computing device 104, among other examples.
  • the operating system may provide a command interface via which interactions may be performed, for example through an accessibility API and/or an extensibility API.
  • a model skill may generate programmatic output that is executed, parsed, or otherwise processed (e.g., as a programmatic skill) to interact with various functionality and/or other aspects of computing device 104 (e.g., application 116, system preferences, etc.) based on the received natural language input.
  • a user of computing device 104 may use such an interface to interact with application/device functionality via multi-stage machine learning model chaining according to aspects described herein.
  • FIG. 2 illustrates an overview of an example conceptual diagram 200 for processing a user input to generate model output using chained machine learning models according to according to aspects described herein.
  • diagram 200 processes user input 202 according to a set of models (e.g., ML models 204 and 206, as orchestrated by chain orchestrator 203) to generate model output 208.
  • user input 202 may be received from a computing device, such as computing device 104 in Figure 1.
  • Aspects of chain orchestrator 203 may be similar to those discussed above with respect to chain orchestrator 108 and are therefore not necessarily redescribed below in detail.
  • User input 202 may include any of a variety of input, including, but not limited to, natural language input, command-line input, input that is received via a framework or an application, and/or input that is received via a central service (e.g., of an operating system) or a uniform resource identifier (URI) handler, among other examples. While examples are described herein with reference to natural language input, it will be appreciated that any of a variety of additional or input types may be received, including, but not limited to, image input and/or video input. Further, natural input may include any of a variety of input, such as text input or speech input.
  • chain orchestrator 203 processes user input 202 to generate a skill chain that includes a plurality of model skills (e.g., including an evaluation by ML model 204 and ML model 206) to ultimately generate model output 208 according to aspects described herein.
  • model skills e.g., including an evaluation by ML model 204 and ML model 206
  • Such aspects may be similar to those discussed above with respect to chain orchestrator 108, such that user input 202 is processed to extract an intent that is mapped to one or more skills (e.g., of skill library 212).
  • an orchestration prompt may be generated by chain orchestrator 203, which includes an indication of one or more skills from skill library 212 (which is also referred to herein as a “skill listing”) and at least a part of user input 202, such that the generative ML model generates a skill chain with which user input 202 is processed.
  • chain orchestrator 203 maps one or more intents of user input 202 to one or more model and/or programmatic skills of skill library 212 accordingly. Additional examples of these and other aspects of chain orchestrator 203 are discussed below with respect to operation 304 of method 300 in Figure 3.
  • skill listing is dynamically generated.
  • skill library 212 may include one or more files that each define one or more skills with which input (e.g., user input and/or intermediate output) may be processed.
  • skill library 212 includes a database that stores a listing of skills. In some instances, a new skill may be registered (e.g., in the database or in an index), thereby indicating that the skill is available for use as part of a skill chain.
  • plug-in application 214 e.g., aspects of which may be similar to application 116) may include one or more skills that are registered within skill library 212, such that processing using a skill of plug-in application 214 may be performed according to aspects described herein.
  • chain orchestrator 203 lists the content of skill library 212 when generating the skill listing.
  • a skill may include or otherwise have an associated description of its functionality (e.g., a manual page or usage information, such as syntax and/or an indication of one or more inputs/outputs), at least a part of which may be included in the skill listing that is generated by chain orchestrator 203 and used to generate the skill chain accordingly.
  • Chain orchestrator 203 thus manages processing of user input 202 according to the generated skill chain.
  • a skill chain may include one or more sequential skills, a hierarchical set of skills, a set of parallel skills, and/or a skill that is dependent on or otherwise processes output from two or more prior skills, among other examples.
  • the evaluation order of the skill chain is determined based on the available skills of skill library 212.
  • the skill chain includes one or more programmatic skills, though the instant example is an example in which two model skills are used, corresponding to ML models 204 and 206.
  • a skill chain is generated by chain orchestrator 203
  • user input 202 is processed by ML model 204 to generate intermediate output.
  • a prompt template corresponding with a first model skill may be populated or otherwise processed to generate a prompt (e.g., including at least a part of user input 202 and/or context from semantic memory store 218) that is processed by ML model 204 accordingly.
  • intermediate output from ML model 204 is then processed by ML model 206.
  • ML models 204 and 206 may each be the same or a similar model (e.g., generating the same content type(s) and/or trained using similar training data) or, as another example, ML models 204 and 206 may each be different models (e.g., generating different sets of content types).
  • ML models 204 and 206 may each use a skill from skill library 212 (e.g., as may have been determined or otherwise identified by chain orchestrator 203 according to aspects described herein).
  • ML model 204 and ML model 206 each use context obtained from recall engine 210, as may be stored by semantic memory store 218. For example, it may be determined (e.g., by chain orchestrator 203 and/or by ML model 204 or 206) that processing associated with a skill should be performed according to context from recall engine 210. In other examples, a skill from skill library 212 may indicate (e.g., as part of an associated prompt template) that context should be obtained from semantic memory store 218, such that recall engine 210 is used to obtain such context accordingly. Thus, it will be appreciated that context may be obtained for a skill as a result of any of a variety of determinations and/or indications, among other examples.
  • semantic memory store 218 stores semantic embeddings (also referred to herein as “semantic addresses”) associated with ML model 204 and/or ML model 206, each of which may correspond to one or more content objects.
  • semantic memory store 218 includes one or more semantic embeddings corresponding to a context object and/or the context object itself or a reference to the context object, among other examples.
  • semantic memory store 218 stores embeddings that are associated with one or more models (e.g., ML model 204 and/or 206) and their specific versions, which may thus represent the same or similar content but in varying semantic embedding spaces (e.g., as is associated with each model/version). Further, when a new model is added or an existing model is updated, one or more entries within semantic memory store 218 may be reencoded (e.g., by generating a new semantic embedding according to the new embedding space).
  • models e.g., ML model 204 and/or 206
  • semantic embedding spaces e.g., as is associated with each model/version
  • a single content object entry within semantic memory store 218 may have a locatable semantic address across models/versions, thereby enabling retrieval of content objects based on a similarity determination (e.g., as a result of an algorithmic comparison) between a corresponding semantic address and a semantic context indication.
  • an input embedding may be generated (e.g., as may be associated with user input 202 and/or processing by ML model 204 or 206).
  • the input embedding may be generated by a machine learning model that encodes an intent corresponding to user input 202 accordingly.
  • the input embedding may be generated based on any of a variety of other input (e.g., audio and/or visual input) that is received by a computer. Additional and/or alternative methods for generating an input embedding may be recognized by those of skill in the art.
  • Recall engine 210 may thus identify one or more content objects that are provided as context for processing associated with a skill based on the input embedding. For example, a set of semantic embeddings that match the input embedding (e.g., using cosine distance, another geometric n- dimensional distance function, or other algorithmic similarity metric) may be identified and used to identify one or more corresponding content objects accordingly. As noted above, processing by ML model 204 and/or ML model 206 may add, remove, or otherwise modify one or more entries of semantic memory store 218, such that context from recall engine 210 that is used by a subsequent skill may be affected by one or more previous skills.
  • model output 208 is generated.
  • one or more model skills of a skill chain may each generate intermediate output (e.g., structured output), while a final skill of the skill chain (e.g., an ML model evaluation by ML model 206, as illustrated) may generate model output 208 based on such intermediate output.
  • final model output produced by ML model 206 includes, but is not limited to, natural language output, speech and/or audio output, image output, video output, and/or programmatic output.
  • Diagram 200 is illustrated in an example where the skill chain includes two model skills (e.g., corresponding to ML model 204 and ML model 206). Arrow 216 is provided to illustrate that, in other examples, additional model skills may be included (e.g., associated with ML model 204, ML model 206, and/or any of a variety of other ML models, not pictured). Further, while diagram 200 depicts an example in model skills are sequential, it will be appreciated that parallel skills, hierarchical ML skills, and/or skills that depend on output from multiple prior skills may be used in other instances, among other examples. Additionally, as noted above, a skill chain may further include one or more programmatic skills in other examples.
  • Figure 3 illustrates an overview of an example method 300 for processing a user input to generate model output according to aspects described herein.
  • aspects of method 300 are performed by a chain orchestrator (e.g., chain orchestrator 108 in Figure 1 or chain orchestrator 203 in Figure 2) and/or by a multi-stage machine learning framework (e.g., multistage machine learning framework 118), among other examples.
  • chain orchestrator e.g., chain orchestrator 108 in Figure 1 or chain orchestrator 203 in Figure 2
  • a multi-stage machine learning framework e.g., multistage machine learning framework 118
  • method 300 begins at operation 302, where user input is received.
  • the user input may be received from a computing device (e.g., computing device 104 in Figure 1), as may be the case when aspects of method 300 are performed by a machine learning service (e.g., machine learning service 102 in Figure 1).
  • the user input is received from an application (e.g., application 116), from a service, or from other software of a computing device (e.g., computing device 104), as may be the case when aspects of method 300 are performed by a multi-stage machine learning framework (e.g., multi-stage machine learning framework 118).
  • the received input may be similar to user input 202 discussed above with respect to Figure 2.
  • the received user input may include natural language input (e.g., text and/or speech input), image input, and/or video input, among any of a variety of other inputs.
  • Method 300 progresses to operation 304, where the received input is processed to generate a skill chain (e.g., corresponding to a set of ML evaluations and, in some examples, further corresponding to a set of programmatic evaluations, according to aspects described herein).
  • a prompt is generated that is processed by a generative ML model to generate the skill chain accordingly.
  • the prompt may be generated based on a prompt template that is populated to include at least a part of the input that was received at operation 302.
  • the prompt template is further populated with a skill listing (e.g., from a skill library, such as skill library 112, 120, and/or 212 in Figures 1 and 2).
  • a chain orchestrator (e.g., chain orchestrator 108) of a machine learning service generates the skill chain.
  • a request is provided to the machine learning service to process the input, such that an indication of the skill chain is received in response, as may be the case when aspects of method 300 are performed by a multi-stage machine learning framework of a client computing device (e.g., computing device 104 in Figure 1).
  • at least a part of such skill chain generation is performed local to the computing device, as may be the case when a generative ML model for performing such aspects is locally available.
  • the generated skill chain may include one or more sequential skills, a hierarchical set of skills, a set of parallel skills, and/or a skill that is dependent on or otherwise processes output from two or more prior skills, among other examples.
  • semantic store similar to semantic memory store 218 may additionally, or alternatively, be used to store one or more embeddings associated with a skill of a skill library (e.g., as may be generated based on an associated skill description, manual page, and/or at least a part of an associated prompt template).
  • an input embedding may be generated for the input that was received at operation 302 (e.g., thereby indicating one or more associated intents) and used to identify one or more skills having associated embeddings that match the input embedding (similar to aspects discussed above with respect to recall engine 210).
  • the identified skills may thus form a skill chain accordingly.
  • a context may be provided to the generative ML model when generating the skill chain (e.g., which may be included as part of the generated prompt), as may be determined by a recall engine from a semantic memory engine, similar to recall engine 210 and semantic memory store 218 discussed above with respect to Figure 2.
  • Flow progresses to operation 306, where a skill is selected from the skill chain that was generated at operation 304.
  • the skill chain that was generated at operation 304 indicates an order, a hierarchy, and/or one or more interdependencies for constituent skills, such that a skill is selected at operation 306 accordingly.
  • a skill chain may, in some examples, include a programmatic skill where any of a variety of processing is performed by a computing device. Accordingly, if it is determined that the selected skill is a programmatic skill, flow branches “YES” to operation 309, where the programmatic skill is processed.
  • operation 309 comprises executing a command, obtaining additional information (e.g., from a user or from a data source), and/or affecting operation of an operating system or an application (e.g., via an API or a command interface), among other examples.
  • performing the programmatic skill at operation 309 may comprise executing programmatic output that was generated by a machine learning model according to aspects described herein. It will therefore be appreciated that any of a variety of programmatic operations may be performed when evaluating a skill chain. Flow then progresses to determination 314, which is discussed below.
  • a semantic memory store e.g., semantic memory store 114, 122, and/or 210 in Figures 1 and 2.
  • the determination may be based on a prompt template corresponding to the selected model skill.
  • the prompt template may indicate that context should be obtained from the semantic memory store and/or may include an indication as to what context should be obtained, if available.
  • it may be automatically determined to recall context from the semantic memory store, as may be determined based on previous model skills that used the same or a similar prompt.
  • context may be obtained from a semantic memory store for a model skill as a result of any of a variety of determinations and/or indications, among other examples.
  • context is generated based the semantic memory store.
  • an input semantic embedding is generated based on the user input and/or the prompt template for which the ML evaluation is to be performed, such that one or more matching semantic embeddings may be identified from the semantic memory store.
  • Content corresponding to the identified semantic embedding(s) is retrieved and used as context for the ML evaluation of the model skill accordingly.
  • the retrieved content may be included in a prompt that is generated according to the prompt template.
  • context may be obtained from any of a variety of sources, including, but not limited to, a user’s computing device (e.g., computing device 104 in Figure 1) and/or a machine learning service (e.g., machine learning service 102), among other examples.
  • a user e.g., computing device 104 in Figure 1
  • a machine learning service e.g., machine learning service 102
  • flow instead branches “NO” from determination 308 to operation 312, which is discussed below.
  • a prompt may be generated based on a prompt template, such that the prompt includes at least a part of the input and, in some examples, the generated context. It will be appreciated that, in other examples, an ML model associated with a model skill may not use prompting. Similar to operation 304, a request for ML processing may be provided to the machine learning service in instances where the skill chain generation aspects of method 300 are performed local to a client computing device, such that generated output is received from the machine learning service in response.
  • the generated output (e.g., as may be received as a response from the machine learning service) may be intermediate output, which, for example, includes structured output that is generated as a result of the prompt including an indication as to such structured output, as noted above. Additional example aspects of operation 312 are discussed below with respect to method 400 of Figure 4.
  • Determination 314 it is determined whether there is a remaining skill in the skill chain that was generated at operation 304.
  • the skill chain is updated as a result of operation 309 and/or 312 described above.
  • Determination 314 may comprise evaluating the skill chain (e.g., as was generated at operation 304 and/or as may have been updated as a result of operation 309 and/or 312) to determine whether there is a skill that has not yet been processed. If it is determined that there is not a remaining skill, flow branches “NO” to operation 316, which is discussed below.
  • Subsequent iterations of operation 312 may use generated output of a previous iteration of operation 309 and/or 312 as input to a model skill when generating subsequent model output.
  • subsequent iterations of operation 309 may use generated output of a previous iteration of operation 309 and/or operation 312, in some examples.
  • at least a part of the received user input is used as input for a subsequent iteration of operation 309 and/or 312.
  • one or more contexts may be chained together as a result of subsequent iterations of operation 310 in some examples.
  • a context corresponding to a previous ML evaluation e.g., as may have been generated by a previous iteration of operation 310 and/or updated by a previous iteration of operation 312 may be used as context for a subsequent ML evaluation by operation 312.
  • method 300 arrives at operation 316, where an indication of the generated output is provided.
  • the indication may be provided to a client computing device, as may be the case in instances where aspects of method 300 are performed by a chain orchestrator of the machine learning platform. Additionally, or alternatively, the indication may be provided by a multi-stage machine learning framework of the client computing device.
  • the indication is provided to an application (e.g., application 116 in Figure 1) for subsequent processing.
  • an indication of at least a part of the generated output is provided to a user of the computing device.
  • the resulting output may include any of a variety of content, including, but not limited to, natural language output, speech and/or audio output, image output, video output, and/or programmatic output.
  • Method 300 terminates at operation 316.
  • Figure 4 illustrates an overview of an example method 400 for processing user input according to a prompt using a generative ML model (also referred to herein as an ML model evaluation) according to aspects described herein.
  • aspects of method 400 are performed as part of operation 312 discussed above with respect to method 300 of Figure 3.
  • method 400 begins at operation 402, where input is obtained. Aspects of the obtained input may be similar to user input 202 discussed above with respect to Figure 2 or that which is received at operation 302 of method 300 in Figure 3 and are therefore not necessarily redescribed below in detail.
  • the input may be obtained from a user of a computing device (e.g., computing device 104 in Figure 1).
  • the input is received as part of a request to generate model output according to aspects described herein (e.g., as a result of performing aspects of operation 312 discussed above with respect to method 300 of Figure 3).
  • a context may be obtained. Operation 404 is illustrated using a dashed box to indicate that, in other examples, operation 404 may be omitted. Similar to operation 402, the context may be obtained as part of a request to generate model output in some examples. In other examples, the context may be obtained from a semantic memory store, as may be generated by a recall engine similar to recall engine 210 from semantic memory store 218, as was discussed above with respect to Figure 2.
  • a prompt is generated.
  • an indication of a model skill e.g., corresponding to a prompt template
  • the prompt template may be obtained based on an association with the model skill in a skill library (e.g., skill library 112 and/or 120 in Figure 1, and/or skill library 212 in Figure 2).
  • the prompt template is processed to incorporate at least a part of the obtained input and, in some examples, the obtained context. For example, one or more fields, regions, or other parts of the prompt template may be replaced or otherwise populated with such aspects, thereby generating a prompt with which model output may be generated for a given model skill.
  • a model is determined from a set of models.
  • a skill for which the prompt was generated at operation 406 may include an indication as to a model with which the generated prompt is to be processed.
  • the received request may include such an indication.
  • the model may be identified from a model repository, such as model repository 110 in Figure 1. In other examples, such a determination need not be made, as may be the case when a machine learning service and/or an associated API via which the request was received performs processing with a single ML model.
  • operation 410 comprises processing the prompt that was generated at operation 406 according to the ML model that was determined at operation 408. Aspects of an example ML model that may be used to perform such processing are described below with respect to Figures 5A-5B.
  • an indication of the generated output is provided.
  • a response to a request that was received as part of operation 402, 404, and/or 406 may be generated that includes at least a part of the model output.
  • the model output may include intermediate and/or structured output, as may be the case when the request corresponds to an intermediate ML evaluation of a skill chain.
  • the indication of generated output may thus be received by the computing device, where subsequent processing may be performed accordingly (e.g., by a multistage machine learning framework and/or an application, such as multi-stage machine learning framework 118 and/or application 116, respectively).
  • Method 400 terminates at operation 412.
  • Figures 5A and 5B illustrate overviews of an example generative machine learning model that may be used according to aspects described herein.
  • FIG. 500 conceptual diagram 500 depicts an overview of pre-trained generative model package 504 that processes an input and a prompt 502 for a skill of a skill chain to generate model output for multi-stage ML model chaining 506 according to aspects described herein.
  • pre-trained generative model package 504 includes, but is not limited to, Megatron-Turing Natural Language Generation model (MT-NLG), Generative Pre-trained Transformer 3 (GPT-3), Generative Pre-trained Transformer 4 (GPT-4), BigScience BLOOM (Large Open-science Open-access Multilingual Language Model), DALL-E, DALL-E 2, Stable Diffusion, or Jukebox.
  • generative model package 504 is pre-trained according to a variety of inputs (e.g., a variety of human languages, a variety of programming languages, and/or a variety of content types) and therefore need not be finetuned or trained for a specific scenario. Rather, generative model package 504 may be more generally pre-trained, such that input 502 includes a prompt that is generated, selected, or otherwise engineered to induce generative model package 504 to produce certain generative model output 506. For example, a prompt includes a context and/or one or more completion prefixes that thus preload generative model package 504 accordingly.
  • generative model package 504 is induced to generate output based on the prompt that includes a predicted sequence of tokens (e.g., up to a token limit of generative model package 504) relating to the prompt.
  • the predicted sequence of tokens is further processed (e.g., by output decoding 516) to yield output 506.
  • each token is processed to identify a corresponding word, word fragment, or other content that forms at least a part of output 506.
  • input 502 and generative model output 506 may each include any of a variety of content types, including, but not limited to, text output, image output, audio output, video output, programmatic output, and/or binary output, among other examples.
  • input 502 and generative model output 506 may have different content types, as may be the case when generative model package 504 includes a generative multimodal machine learning model.
  • generative model package 504 may be used in any of a variety of scenarios and, further, a different generative model package may be used in place of generative model package 504 without substantially modifying other associated aspects (e.g., similar to those described herein with respect to Figures 1, 2, 3, and 4). Accordingly, generative model package 504 operates as a tool with which machine learning processing is performed, in which certain inputs 502 to generative model package 504 are programmatically generated or otherwise determined, thereby causing generative model package 504 to produce model output 506 that may subsequently be used for further processing.
  • Generative model package 504 may be provided or otherwise used according to any of a variety of paradigms.
  • generative model package 504 may be used local to a computing device (e.g., computing device 104 in Figure 1) or may be accessed remotely from a machine learning service (e.g., machine learning service 102).
  • aspects of generative model package 504 are distributed across multiple computing devices.
  • generative model package 504 is accessible via an application programming interface (API), as may be provided by an operating system of the computing device and/or by the machine learning service, among other examples.
  • API application programming interface
  • generative model package 504 includes input tokenization 508, input embedding 510, model layers 512, output layer 514, and output decoding 516.
  • input tokenization 508 processes input 502 to generate input embedding 510, which includes a sequence of symbol representations that corresponds to input 502. Accordingly, input embedding 510 is processed by model layers 512, output layer 514, and output decoding 516 to produce model output 506.
  • An example architecture corresponding to generative model package 504 is depicted in Figure 5B, which is discussed below in further detail. Even so, it will be appreciated that the architectures that are illustrated and described herein are not to be taken in a limiting sense and, in other examples, any of a variety of other architectures may be used.
  • Figure 5B is a conceptual diagram that depicts an example architecture 550 of a pre-trained generative machine learning model that may be used according to aspects described herein.
  • any of a variety of alternative architectures and corresponding ML models may be used in other examples without departing from the aspects described herein.
  • architecture 550 processes input 502 to produce generative model output 506, aspects of which were discussed above with respect to Figure 5A.
  • Architecture 550 is depicted as a transformer model that includes encoder 552 and decoder 554.
  • Encoder 552 processes input embedding 558 (aspects of which may be similar to input embedding 510 in Figure 5A), which includes a sequence of symbol representations that corresponds to input 556.
  • input 556 includes input and prompt 502 corresponding to a skill of a skill chain, aspects of which may be similar to user input 202, context from semantic memory store 218, and/or a prompt that was generated based on a prompt template of a skill from skill library 112, 120, and/or 212 according to aspects described herein.
  • encoder 552 includes example layer 570. It will be appreciated that any number of such layers may be used, and that the depicted architecture is simplified for illustrative purposes.
  • Example layer 570 includes two sub-layers: multi-head attention layer 562 and feed forward layer 566. In examples, a residual connection is included around each layer 562, 566, after which normalization layers 564 and 568, respectively, are included.
  • Decoder 554 includes example layer 590. Similar to encoder 552, any number of such layers may be used in other examples, and the depicted architecture of decoder 554 is simplified for illustrative purposes. As illustrated, example layer 590 includes three sub-layers: masked multihead attention layer 578, multi-head attention layer 582, and feed forward layer 586. Aspects of multi -head attention layer 582 and feed forward layer 586 may be similar to those discussed above with respect to multi-head attention layer 562 and feed forward layer 566, respectively. Additionally, masked multi-head attention layer 578 performs multi-head attention over the output of encoder 552 (e.g., output 572). In examples, masked multi-head attention layer 578 prevents positions from attending to subsequent positions.
  • masked multi-head attention layer 578 performs multi-head attention over the output of encoder 552 (e.g., output 572). In examples, masked multi-head attention layer 578 prevents positions from attending to subsequent positions.
  • Such masking combined with offsetting the embeddings (e.g., by one position, as illustrated by multi-head attention layer 582), may ensure that a prediction for a given position depends on known output for one or more positions that are less than the given position.
  • residual connections are also included around layers 578, 582, and 586, after which normalization layers 580, 584, and 588, respectively, are included.
  • Multi-head attention layers 562, 578, and 582 may each linearly project queries, keys, and values using a set of linear projections to a corresponding dimension.
  • Each linear projection may be processed using an attention function (e.g., dot-product or additive attention), thereby yielding n- dimensional output values for each linear projection.
  • the resulting values may be concatenated and once again projected, such that the values are subsequently processed as illustrated in Figure 5B (e.g., by a corresponding normalization layer 564, 580, or 584).
  • Feed forward layers 566 and 586 may each be a fully connected feed-forward network, which applies to each position.
  • feed forward layers 566 and 586 each include a plurality of linear transformations with a rectified linear unit activation in between.
  • each linear transformation is the same across different positions, while different parameters may be used as compared to other linear transformations of the feed-forward network.
  • linear transformation 592 may be similar to the linear transformations discussed above with respect to multi-head attention layers 562, 578, and 582, as well as feed forward layers 566 and 586.
  • Softmax 594 may further convert the output of linear transformation 592 to predicted next-token probabilities, as indicated by output probabilities 596.
  • the illustrated architecture is provided in as an example and, in other examples, any of a variety of other model architectures may be used in accordance with the disclosed aspects.
  • multiple iterations of processing are performed according to the abovedescribed aspects (e.g., using generative model package 504 in FIG. 5A or encoder 552 and decoder 554 in FIG.
  • output probabilities 596 may thus form chained ML evaluation output 506 according to aspects described herein, such that the output of the generative ML model (e.g., which may include structured output) is used as input for a subsequent skill of a skill chain according to aspects described herein (e.g., similar to a “YES” determination at determination 314 of method 300 in Figure 3).
  • chained ML evaluation output 506 is provided as generated output after processing a skill chain (e.g., similar to aspects of operation 316 of method 300), which may further be processed according to the disclosed aspects.
  • Figures 6-8 and the associated descriptions provide a discussion of a variety of operating environments in which aspects of the disclosure may be practiced.
  • the devices and systems illustrated and discussed with respect to Figures 6-8 are for purposes of example and illustration and are not limiting of a vast number of computing device configurations that may be utilized for practicing aspects of the disclosure, described herein.
  • FIG. 6 is a block diagram illustrating physical components (e.g., hardware) of a computing device 600 with which aspects of the disclosure may be practiced.
  • the computing device components described below may be suitable for the computing devices described above, including one or more devices associated with machine learning service 102, as well as computing device 104 discussed above with respect to Figure 1.
  • the computing device 600 may include at least one processing unit 602 and a system memory 604.
  • the system memory 604 may comprise, but is not limited to, volatile storage (e.g., random access memory), non-volatile storage (e.g., read-only memory), flash memory, or any combination of such memories.
  • the system memory 604 may include an operating system 605 and one or more program modules 606 suitable for running software application 620, such as one or more components supported by the systems described herein.
  • system memory 604 may store chain orchestrator 624 and recall engine 626.
  • the operating system 605, for example, may be suitable for controlling the operation of the computing device 600.
  • FIG. 6 This basic configuration is illustrated in FIG. 6 by those components within a dashed line 608.
  • the computing device 600 may have additional features or functionality.
  • the computing device 600 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape.
  • additional storage is illustrated in FIG. 6 by a removable storage device 609 and a nonremovable storage device 610.
  • program modules 606 may perform processes including, but not limited to, the aspects, as described herein.
  • Other program modules may include electronic mail and contacts applications, word processing applications, spreadsheet applications, database applications, slide presentation applications, drawing or computer-aided application programs, etc.
  • embodiments of the disclosure may be practiced in an electrical circuit comprising discrete electronic elements, packaged or integrated electronic chips containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors.
  • an electrical circuit comprising discrete electronic elements, packaged or integrated electronic chips containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors.
  • embodiments of the disclosure may be practiced via a system-on- a-chip (SOC) where each or many of the components illustrated in FIG. 6 may be integrated onto a single integrated circuit.
  • SOC device may include one or more processing units, graphics units, communications units, system virtualization units and various application functionality all of which are integrated (or “burned”) onto the chip substrate as a single integrated circuit.
  • the functionality, described herein, with respect to the capability of client to switch protocols may be operated via application-specific logic integrated with other components of the computing device 600 on the single integrated circuit (chip).
  • Embodiments of the disclosure may also be practiced using other technologies capable of performing logical operations such as, for example, AND, OR, and NOT, including but not limited to mechanical, optical, fluidic, and quantum technologies.
  • embodiments of the disclosure may be practiced within a general purpose computer or in any other circuits or systems.
  • the computing device 600 may also have one or more input device(s) 612 such as a keyboard, a mouse, a pen, a sound or voice input device, a touch or swipe input device, etc.
  • the output device(s) 614 such as a display, speakers, a printer, etc. may also be included.
  • the aforementioned devices are examples and others may be used.
  • the computing device 600 may include one or more communication connections 616 allowing communications with other computing devices 650. Examples of suitable communication connections 616 include, but are not limited to, radio frequency (RF) transmitter, receiver, and/or transceiver circuitry; universal serial bus (USB), parallel, and/or serial ports.
  • RF radio frequency
  • USB universal serial bus
  • Computer readable media may include computer storage media.
  • Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, or program modules.
  • the system memory 604, the removable storage device 609, and the non-removable storage device 610 are all computer storage media examples (e.g., memory storage).
  • Computer storage media may include RAM, ROM, electrically erasable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other article of manufacture which can be used to store information and which can be accessed by the computing device 600. Any such computer storage media may be part of the computing device 600.
  • Computer storage media does not include a carrier wave or other propagated or modulated data signal.
  • Communication media may be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media.
  • modulated data signal may describe a signal that has one or more characteristics set or changed in such a manner as to encode information in the signal.
  • communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared, and other wireless media.
  • RF radio frequency
  • FIG. 7 illustrates a system 700 that may, for example, be a mobile computing device, such as a mobile telephone, a smart phone, wearable computer (such as a smart watch), a tablet computer, a laptop computer, and the like, with which embodiments of the disclosure may be practiced.
  • the system 700 is implemented as a “smart phone” capable of running one or more applications (e.g., browser, e-mail, calendaring, contact managers, messaging clients, games, and media clients/players).
  • the system 700 is integrated as a computing device, such as an integrated personal digital assistant (PDA) and wireless phone.
  • PDA personal digital assistant
  • the system 700 typically includes a display 705 and one or more input buttons that allow the user to enter information into the system 700.
  • the display 705 may also function as an input device (e.g., a touch screen display).
  • an optional side input element allows further user input.
  • the side input element may be a rotary switch, a button, or any other type of manual input element.
  • system 700 may incorporate more or less input elements.
  • the display 705 may not be a touch screen in some embodiments.
  • an optional keypad 735 may also be included, which may be a physical keypad or a “soft” keypad generated on the touch screen display.
  • the output elements include the display 705 for showing a graphical user interface (GUI), a visual indicator (e.g., a light emitting diode 720), and/or an audio transducer 725 (e.g., a speaker).
  • GUI graphical user interface
  • a vibration transducer is included for providing the user with tactile feedback.
  • input and/or output ports are included, such as an audio input (e.g., a microphone jack), an audio output (e.g., a headphone jack), and a video output (e.g., a HDMI port) for sending signals to or receiving signals from an external device.
  • One or more application programs 766 may be loaded into the memory 762 and run on or in association with the operating system 764. Examples of the application programs include phone dialer programs, e-mail programs, personal information management (PIM) programs, word processing programs, spreadsheet programs, Internet browser programs, messaging programs, and so forth.
  • the system 700 also includes a non-volatile storage area 768 within the memory 762. The non-volatile storage area 768 may be used to store persistent information that should not be lost if the system 700 is powered down.
  • the application programs 766 may use and store information in the non-volatile storage area 768, such as e-mail or other messages used by an e- mail application, and the like.
  • a synchronization application (not shown) also resides on the system 700 and is programmed to interact with a corresponding synchronization application resident on a host computer to keep the information stored in the non-volatile storage area 768 synchronized with corresponding information stored at the host computer.
  • other applications may be loaded into the memory 762 and run on the system 700 described herein.
  • the system 700 has a power supply 770, which may be implemented as one or more batteries.
  • the power supply 770 might further include an external power source, such as an AC adapter or a powered docking cradle that supplements or recharges the batteries.
  • the system 700 may also include a radio interface layer 772 that performs the function of transmitting and receiving radio frequency communications.
  • the radio interface layer 772 facilitates wireless connectivity between the system 700 and the “outside world,” via a communications carrier or service provider. Transmissions to and from the radio interface layer 772 are conducted under control of the operating system 764. In other words, communications received by the radio interface layer 772 may be disseminated to the application programs 766 via the operating system 764, and vice versa.
  • the visual indicator 720 may be used to provide visual notifications, and/or an audio interface 774 may be used for producing audible notifications via the audio transducer 725.
  • the visual indicator 720 is a light emitting diode (LED) and the audio transducer 725 is a speaker.
  • LED light emitting diode
  • the LED may be programmed to remain on indefinitely until the user takes action to indicate the powered- on status of the device.
  • the audio interface 774 is used to provide audible signals to and receive audible signals from the user.
  • the audio interface 774 may also be coupled to a microphone to receive audible input, such as to facilitate a telephone conversation.
  • the microphone may also serve as an audio sensor to facilitate control of notifications, as will be described below.
  • the system 700 may further include a video interface 776 that enables an operation of an on-board camera 730 to record still images, video stream, and the like.
  • system 700 may have additional features or functionality.
  • system 700 may also include additional data storage devices (removable and/or non-removable) such as, magnetic disks, optical disks, or tape. Such additional storage is illustrated in FIG. 7 by the non-volatile storage area 768.
  • Data/information generated or captured and stored via the system 700 may be stored locally, as described above, or the data may be stored on any number of storage media that may be accessed by the device via the radio interface layer 772 or via a wired connection between the system 700 and a separate computing device associated with the system 700, for example, a server computer in a distributed computing network, such as the Internet. As should be appreciated, such data/information may be accessed via the radio interface layer 772 or via a distributed computing network. Similarly, such data/information may be readily transferred between computing devices for storage and use according to any of a variety of data/information transfer and storage means, including electronic mail and collaborative data/information sharing systems.
  • FIG. 8 illustrates one aspect of the architecture of a system for processing data received at a computing system from a remote source, such as a personal computer 804, tablet computing device 806, or mobile computing device 808, as described above.
  • Content displayed at server device 802 may be stored in different communication channels or other storage types.
  • various documents may be stored using a directory service 824, a web portal 825, a mailbox service 826, an instant messaging store 828, or a social networking site 830.
  • a multi-stage machine learning framework 820 (e.g., similar to the application 620) may be employed by a client that communicates with server device 802. Additionally, or alternatively, chain orchestrator 821 may be employed by server device 802.
  • the server device 802 may provide data to and from a client computing device such as a personal computer 804, a tablet computing device 806 and/or a mobile computing device 808 (e.g., a smart phone) through a network 815.
  • a client computing device such as a personal computer 804, a tablet computing device 806 and/or a mobile computing device 808 (e.g., a smart phone) through a network 815.
  • a client computing device such as a personal computer 804, a tablet computing device 806 and/or a mobile computing device 808 (e.g., a smart phone).
  • the computer system described above may be embodied in a personal computer 804, a tablet computing device 806 and/or a mobile computing device 808 (e.g.,
  • any of these examples of the computing devices may obtain content from the store 816, in addition to receiving graphical data useable to be either pre-processed at a graphic-originating system, or post-processed at a receiving computing system.
  • the aspects and functionalities described herein may operate over distributed systems (e.g., cloud-based computing systems), where application functionality, memory, data storage and retrieval and various processing functions may be operated remotely from each other over a distributed computing network, such as the Internet or an intranet.
  • User interfaces and information of various types may be displayed via on-board computing device displays or via remote display units associated with one or more computing devices. For example, user interfaces and information of various types may be displayed and interacted with on a wall surface onto which user interfaces and information of various types are projected.
  • Interaction with the multitude of computing systems with which embodiments of the invention may be practiced include, keystroke entry, touch screen entry, voice or other audio entry, gesture entry where an associated computing device is equipped with detection (e.g., camera) functionality for capturing and interpreting user gestures for controlling the functionality of the computing device, and the like.
  • detection e.g., camera
  • one aspect of the technology relates to a system comprising: at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the system to perform a set of operations.
  • generating the skill chain comprises: generating a skill listing corresponding to a set of skills of a skill library, wherein the skill listing includes a description for each skill of the set of skills; providing, to a machine learning service, an indication of the user input and the skill listing; and receiving, from the machine learning service, the skill chain corresponding to the user input.
  • generating the skill chain comprises: generating, for the user input, an input embedding that encodes an intent of the user input; determining, from a skill library, a set of skills that each have an associated semantic embedding that matches the generated input embedding; and generating the skill chain based on the determined set of skills.
  • processing the first prompt to obtain the intermediate output comprises: providing, to a machine learning service, a request to process the first prompt using the first machine learning model; and receiving, from the machine learning service, a response that includes the intermediate output.
  • the intermediate output of the first model skill includes structured output.
  • at least a part the first prompt corresponds to the structured output.
  • the technology in another aspect, relates to a method.
  • the method comprises: obtaining, at a computing device, a skill chain corresponding to an input; for a first model skill of the skill chain: generating, based on a first prompt template associated with the first model skill, a first prompt that includes at least a part of the user input; and processing, using a first machine learning model associated with the first model skill, the first prompt to obtain intermediate output; for a second model skill of the skill chain: generating, based on a second prompt template associated with the second model skill, a second prompt that includes at least a part of the intermediate output as input for the second model skill; and processing, using a second machine learning model associated with the second model skill, the second prompt to obtain model output; and processing, by the computing device, at least a part of the model output to affect operation of the computing device.
  • the first machine learning model is the second machine learning model.
  • the skill chain further comprises a programmatic skill that is performed by the computing device; and output of the programmatic skill is processed as input for the second model skill.
  • the intermediate output of the first model skill includes structured output that is processed by the programmatic skill.
  • processing the part of the model output comprises displaying the part of the model output to a user of the computing device.
  • processing the part of the model output comprises parsing, by an application of the computing device, the part of the model output to affect operation of the application.
  • the technology relates to another method.
  • the method comprises: obtaining user input from a user; generating, based on the user input, a skill chain that includes a set of skills with which to process the user input; for a first model skill of the skill chain: generating, based on a first prompt template associated with the first model skill, a first prompt that includes at least a part of the obtained user input; and processing, using a first machine learning model associated with the first model skill, the first prompt to obtain intermediate output; for a second model skill of the skill chain: generating, based on a second prompt template associated with the second model skill, a second prompt that includes at least a part of the intermediate output as input for the second model skill; and processing, using a second machine learning model associated with the second model skill, the second prompt to obtain model output; and providing an indication of the model output for display to the user.
  • generating the skill chain comprises: generating a skill listing corresponding to a set of skills of a skill library, wherein the skill listing includes a description for each skill of the set of skills; providing, to a machine learning service, an indication of the user input and the skill listing; and receiving, from the machine learning service, the skill chain corresponding to the user input.
  • generating the skill chain comprises: generating, for the user input, an input embedding that encodes an intent of the user input; determining, from a skill library, a set of skills that each have an associated semantic embedding that matches the generated input embedding; and generating the skill chain based on the determined set of skills.
  • processing the first prompt to obtain the intermediate output comprises: providing, to a machine learning service, a request to process the first prompt using the first machine learning model; and receiving, from the machine learning service, a response that includes the intermediate output.
  • the intermediate output of the first model skill includes structured output.
  • at least a part the first prompt corresponds to the structured output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

Selon l'invention, une chaîne de compétences composée d'un ensemble d'évaluations de modèle d'apprentissage automatique permettant le traitement d'une entrée est générée et utilisée pour finalement produire une sortie de modèle en conséquence. Chaque évaluation de modèle d'apprentissage automatique correspond à une « compétence modèle » de la chaîne de compétences. Une sortie intermédiaire qui est générée par une première évaluation d'apprentissage automatique pour une première compétence de modèle de la chaîne de compétences peut ensuite être traitée en tant qu'entrée dans une seconde évaluation d'apprentissage automatique pour une seconde compétence de modèle de la chaîne de compétences, ce qui permet de générer finalement une sortie de modèle pour l'entrée donnée. Une telle chaîne de compétences peut comprendre un nombre quelconque de compétences selon l'une quelconque d'une variété de structures et n'a pas besoin d'être des évaluations utilisant le même modèle d'apprentissage automatique.
PCT/US2023/081254 2022-12-19 2023-11-28 Chaînage de modèles d'apprentissage automatique à étages multiples WO2024137122A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202263433627P 2022-12-19 2022-12-19
US63/433,627 2022-12-19
US18/122,575 2023-03-16
US18/122,575 US20240202582A1 (en) 2022-12-19 2023-03-16 Multi-stage machine learning model chaining

Publications (1)

Publication Number Publication Date
WO2024137122A1 true WO2024137122A1 (fr) 2024-06-27

Family

ID=89507595

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/081254 WO2024137122A1 (fr) 2022-12-19 2023-11-28 Chaînage de modèles d'apprentissage automatique à étages multiples

Country Status (1)

Country Link
WO (1) WO2024137122A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11017780B2 (en) * 2017-08-02 2021-05-25 Veritone, Inc. System and methods for neural network orchestration
US20210224336A1 (en) * 2017-07-10 2021-07-22 Ebay Inc. Expandable service architecture with configurable dialogue manager

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210224336A1 (en) * 2017-07-10 2021-07-22 Ebay Inc. Expandable service architecture with configurable dialogue manager
US11017780B2 (en) * 2017-08-02 2021-05-25 Veritone, Inc. System and methods for neural network orchestration

Similar Documents

Publication Publication Date Title
US11200269B2 (en) Method and system for highlighting answer phrases
US9965465B2 (en) Distributed server system for language understanding
CN107735804B (zh) 用于不同标记集合的转移学习技术的系统和方法
US9996532B2 (en) Systems and methods for building state specific multi-turn contextual language understanding systems
US20230142892A1 (en) Policy authoring for task state tracking during dialogue
CN107592926B (zh) 使用任务帧建立多模式协同对话
CN111954864A (zh) 自动化演示控制
US20180061393A1 (en) Systems and methods for artifical intelligence voice evolution
EP3549034A1 (fr) Systèmes et procédés de génération automatisée de réponses à des interrogations
US20140350931A1 (en) Language model trained using predicted queries from statistical machine translation
US11829374B2 (en) Document body vectorization and noise-contrastive training
WO2019005387A1 (fr) Entrée de commande utilisant de robustes paramètres d'entrée
EP4305523A1 (fr) Recommandations de macro produites par ordinateur
US10534780B2 (en) Single unified ranker
WO2023129348A1 (fr) Édition générative multidirectionnelle
US20240202582A1 (en) Multi-stage machine learning model chaining
US20220405709A1 (en) Smart Notifications Based Upon Comment Intent Classification
WO2024137122A1 (fr) Chaînage de modèles d'apprentissage automatique à étages multiples
US20240202584A1 (en) Machine learning instancing
US11250074B2 (en) Auto-generation of key-value clusters to classify implicit app queries and increase coverage for existing classified queries
US20240201959A1 (en) Machine learning structured result generation
US20240202451A1 (en) Multi-dimensional entity generation from natural language input
US20240202460A1 (en) Interfacing with a skill store
WO2024137183A1 (fr) Instanciation d'apprentissage automatique
US20230409654A1 (en) On-Device Artificial Intelligence Processing In-Browser