EP3590050A1 - Entwicklerplattform zur bereitstellung eines automatisierten assistenten in neuen domänen - Google Patents

Entwicklerplattform zur bereitstellung eines automatisierten assistenten in neuen domänen

Info

Publication number
EP3590050A1
EP3590050A1 EP18761097.7A EP18761097A EP3590050A1 EP 3590050 A1 EP3590050 A1 EP 3590050A1 EP 18761097 A EP18761097 A EP 18761097A EP 3590050 A1 EP3590050 A1 EP 3590050A1
Authority
EP
European Patent Office
Prior art keywords
domain
automated assistant
language
data
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP18761097.7A
Other languages
English (en)
French (fr)
Other versions
EP3590050A4 (de
Inventor
David Leo Wright HALL
Daniel Klein
David Ernesto Heekin BURKETT
Jordan Rian COHEN
Daniel Lawrence Roth
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Semantic Machines Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Semantic Machines Inc filed Critical Semantic Machines Inc
Priority claimed from PCT/US2018/020784 external-priority patent/WO2018161048A1/en
Publication of EP3590050A1 publication Critical patent/EP3590050A1/de
Publication of EP3590050A4 publication Critical patent/EP3590050A4/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation

Definitions

  • a dialogue assistant that is trained in a first domain can receive a specification in a second domain.
  • the specification can include language structure data such as schemas, recognizers, resolvers, constraints and invariants, actions, language hints, generation template, and other data.
  • the specification data is applied to the automated assistant to enable the automated assistant to provide interactive dialogue with a user in a second domain associated with the received specification.
  • portions of the specification may be automatically mapped to portions of the first domain, while other portions of the specification may be mapped over time through learning or through input received from annotators or other sources.
  • the present system includes an automated assistant platform which allows the developer to leverage the language competence learned by previous applications, while taking advantage of the ease of integration of the automated assistant with data associated with a new application
  • a method provides an automated assistant in multiple domains.
  • the method includes receiving a specification for a second domain for an automated assistant, wherein the automated assistant configured with training data for a first domain.
  • the specification can be applied to the automated assistant, the automated assistant utilizing the specification and the first domain.
  • Interactive dialogue can be conducted with a user by the automated assistant based on the first domain and the applied specification.
  • a non-transitory computer readable storage medium has embodied thereon a program, wherein the program is executable by a processor to perform the method for providing an automated assistant in multiple domains.
  • a system includes a processor, memory, one or more modules stored in memory and executable by the processor to perform operations similar to the method described above.
  • FIGURE 1 is a block diagram of a system for providing an automated assistant that can be implemented in multiple domains.
  • FIGURE 2 is a block diagram of modules implementing an automated assistant application that can be expanded to operate in multiple domains.
  • FIGURE 3 is a block diagram of an automated assistant that receives data for new domain.
  • FIGURE 4 is a method for providing an interactive automated assistant in multiple domains.
  • FIGURE 5 is a method for receiving a specification for a second domain.
  • FIGURE 6 is a method for providing an interactive automated assistant using it for specification and a second specification.
  • FIGURE 7 is a block diagram of a computing environment for implementing the present technology.
  • the present technology provides a sharable language interface for implementing automated assistants in new domains and applications.
  • a dialogue assistant that is trained in a first domain can receive a specification in a second domain.
  • the specification can include language structure data such as schemas, recognizers, resolvers, constraints and invariants, actions, language hints, generation template, and other data.
  • the specification data is applied to the automated assistant to enable the automated assistant to provide interactive dialogue with a user in a second domain associated with the received specification.
  • portions of the specification may be automatically mapped to portions of the first domain, while other portions of the specification may be mapped over time through learning or through input received from annotators or other sources.
  • the present system includes an automated assistant platform which allows the developer to leverage the language competence learned by previous applications, while taking advantage of the ease of integration of the automated assistant with data associated with a new application.
  • the description of such a system may be found in U.S. patent application no. 15/298475, titled “The Attentive Assistant, and U.S. patent application no. 15/328448, titled “Interaction Assistant,” the disclosures of which are incorporated by reference herein in their entirety.
  • FIGURE 1 is a block diagram of a system for providing an automated assistant that can be implemented in multiple domains.
  • System 100 of FIGURE 1 includes client 110, mobile device 120, computing device 130, network 140, network server 150, application server 160, and data store 170.
  • Client 110, mobile device 120, and computing device 130 communicate with network server 150 over network 140.
  • Network 140 may include a private network, public network, the Internet, and intranet, a WAN, a LAN, a cellular network, or some other network suitable for the transmission of data between computing devices of FIGURE 1.
  • Client 110 includes application 112.
  • Application 112 may provide an automated assistant, TTS functionality, automatic speech recognition, parsing, domain detection, and other functionality discussed herein.
  • Application 112 may be implemented as one or more applications, objects, modules, or other software.
  • Application 112 may communicate with application server 160 and data store 170 through the server architecture of FIGURE 1 or directly (not illustrated in figure 1) to access data.
  • Mobile device 120 may include a mobile application 122.
  • the mobile application may provide the same functionality described with respect to application 112.
  • Mobile application 122 may be implemented as one or more applications, objects, modules, or other software, and may operate to provide services in conjunction with application server 160.
  • Computing device 130 may include a network browser 132.
  • the network browser may receive one or more content pages, script code and other code that when loaded into the network browser the same functionality described with respect to application 112.
  • the content pages may operate to provide services in conjunction with application server 160.
  • Network server 150 may receive requests and data from application 112, mobile application 122, and network browser 132 via network 140. The request may be initiated by the particular applications or browser applications. Network server 150 may process the request and data, transmit a response, or transmit the request and data or other content to application server 160.
  • Application server 160 includes application 162.
  • the application server may receive data, including data requests received from applications 112 and 122 and browser 132, process the data, and transmit a response to network server 150.
  • the network server 152 forwards responses to the computer or application that originally sent the request.
  • Application's server 160 may also communicate with data store 170. For example, data can be accessed from data store 170 to be used by an application to provide the functionality described with respect to application 112.
  • Application server 160 includes application 162, which may operate similar to application 112 except implemented all or in part on application server 160.
  • Block 200 includes network server 150, application server 160, and data store 170, and may be used to implement an automated assistant that includes a domain detection mechanism. Block 200 is discussed in more detail with respect to FIGURE 2.
  • FIGURE 2 is a block diagram of modules implementing an automated assistant application that can be expanded to operate in multiple domains.
  • the modules comprising the automated assistant application may implement all or a portion of application 112 of client 110, mobile application 122 of mobile device 120, and/or application 162 and server 160 in the system of FIGURE 1.
  • the automated assistant application of FIGURE 2 includes automatic speech recognition module 210, parser module 220, paraphrase module 222, autocorrect module 224, detection mechanism module 230, dialog manager module 240, inference module 242, dialogue pattern module 244, and text to speech (generation) module 250.
  • Automatic speech recognition module 210 receives an audio content, such as content received through a microphone from one of client 110, mobile device 120, or computing device 130, and may process the audio content to identify speech.
  • the ASR module can output the recognized speech as a text utterance to parser 220.
  • Parser 220 receives the speech utterance, which includes one or more words, and can interpret a user utterance into intentions. Parser 220 may generate one or more plans, for example by creating one or more cards, using a current dialogue state received from elsewhere in the automated assistant. For example, parser 220, as a result of performing a parsing operation on the utterance, may generate one or more plans that may include performing one or more actions or tasks. In some instances, a plan may include generating one or more cards within a system. In another example, the action plan may include generating number of steps by system such as that described in US patent application number 62/462,736, filed February 23, 2017, entitled "Expandable Dialogue System," the disclosure of which is incorporated herein in its entirety.
  • a semantic parser is used to create information for the dialog manager.
  • This semantic parser uses information about past usage as a primary source of information, combining the past use information with system actions and outputs, allowing each collection of words to be described by its contribution to the system actions. This results in creating a semantic description of the word/phrases
  • Detection mechanism 230 can receive the plan and coverage vector generated by parser 220, detect unparsed words that are likely to be important in the utterance, and modify the plan based on important unparsed words. Detection mechanism 230 may include a classifier that classifies each unparsed word as important or not based on one or more features. For each important word, a determination is made as to whether a score for the important word achieves a threshold. In some instances, any word or phrase candidate which is not already parsed by the system is analyzed by reference to its past statistical occurrences, and the system then decides whether or not to pay attention to the phrases. If the score for the important unparsed word reaches the threshold, the modified plan may include generating a message that the important unparsed word or some action associated with the unparsed word cannot be handled or performed by the administrative assistant.
  • Dialog manager 240 may perform actions based on a plan and context received from detection mechanism 230 and/or parser 220 and generate a response based on the actions performed and any responses received, for example from external services and entities.
  • the dialog manager's generated response may be output to text-to-speech module 250.
  • Text-to- speech module 250 may receive the response, generate speech the received response, and output the speech to a device associated with a user.
  • Paraphrase module 222 may communicate with parser 222 to provide paraphrase content for words or phrases in the utterance received by parser 220.
  • AutoCorrect module 224 may correct or suggest alternative spellings for words or phrases in the utterance received by parser 220.
  • Interference module 242 can be used to search databases and interact with users.
  • the engine is augmented by per-domain-type sub-solvers and a constraint graph appropriate for the domain, and the general purpose engine uses a combination of its own inference mechanisms and the sub-solvers.
  • the general purpose inference engine could be a CSP solver or a weighted variant thereof.
  • solvers include resolvers, constraints, preferences, or more classical domain-specific modules such as one that reasons about constraints on dates and times or numbers. Solvers respond with either results or with a message about the validity of certain constraints, or with information about which constraints must be supplied for it to function.
  • Dialogue pattern module 244 may include domain independent and domain customized patterns that have been learned from past domains.
  • the dialogue patterns may include one or more mechanisms for gathering constraints on a set of objects, for example for the purpose of pursuing a user intent.
  • the dialogue patterns may also include command processing, for example to logic command towards an external process, list processing which reads emails or messages, play songs, and so forth, and list building such as for example keeping a grocery list, keeping annotations for a diary, creating an agenda for a meeting, and so on.
  • the Automated Assistant provides many of its services through a "UI toolkit" for dialog, including domain-independent and domain-customized patterns that have been learned from past experience.
  • UI toolkit for dialog, including domain-independent and domain-customized patterns that have been learned from past experience.
  • a partial set of such services for a UI toolkit are:
  • Constraints may be explicit or implied, and may be hard (12 PM) or soft (next week, in the morning, cheap). Winnowing processes are common to booking a flight, buying a camera, finding a song to listen to, determining an airport from which to fly, and many other common tasks.
  • Command processing which launches a command towards some external process.
  • Command processing includes confirmation behavior, error handling, and
  • List building including keeping a grocery list, keeping annotations for a diary, creating an agenda for a meeting, adding segments to a flight itinerary, and other functions.
  • Each of the UI toolkit elements are constructed in such a way that they have reasonable "default" behavior that may be tailored either via developer intervention or machine learning. For instance, Winnow may operate in a mode in which it asks the user for confirmation before returning the top option, or it may operate in a mode in which it doesn't, or it may ask the user to select between several options, or it might be better to summarize the available options and offer new constraints, or alternate searches.
  • Each of the UI toolkit offerings is built with many parameters. For instance, should the system always offer only the top choice? Should all (reasonable) choices be described? Should the element summarize the choices and offer alternate searches (or, in the case of a search without success, should the system guide a restatement or a refinement of the constraints?) Will the system automatically constrain the search with default entries, or should default entries be automatically overwritten by users' actions? If the system returns more than one option, is there a default ordering?
  • dialog toolkit elements parameterize these different behaviors with a combination of developer-specified configuration and machine learning "features.”
  • Developer-specified configuration provides the default behavior for the particular instantiation of the ui element. For instance, in a flight-booking application, Winnow will likely be configured to confirm with the user before returning the flight. In this case, the developer would tell Winnow that it should be highly confirmatory. On the other hand, in a music-jukebox application, the system will likely return the song immediately (since likely the user wants to hear music, and not negotiate about which song), and thus in a "low confirmation" state.
  • New domain 270 represents specification data received by the automated assistant application to enable the automated assistant to conduct dialogue in the new domain.
  • the new domain data may include schema, resolvers, invariants, constraints, recognizers, lexical hints, generation templates, training snippets, and other data.
  • New domain data 270 is discussed in more detail below with respect to FIGURE 3.
  • a Developer Platform for Dialog should provide these functions to the developer without requiring them to have PhDs in machine learning or linguistics.
  • the platform described here enables this.
  • the system has basic functionality immediately, without needing any additional training data. Rather, the system, using what it has learned from other domains or applications, can interpret the specification provided by the developer to immediately provide functionality not present on other platforms.
  • Vertical applications are those that are common enough that the platform includes specialized functionality that has been developed by the platform's own developers. For instance, the platform may contain pre-built functionality for banking or booking flights.
  • mapping of one vertical database to another might be automated if the two data schema are similar, possibly with human assistance. That is, if the developer has a schema for a banking database, and the Automated Assistant already has a schema for a banking database, it should be very simple to automatically or manually mark corresponding elements, fields, and operations from the two databases. It will then be necessary to fill in missing operations and entities in the schema to fully coordinate the two vertical applications.
  • machine learning may be used to modify the system's internal parameters and learn appropriate weights so that the system will more likely respond correctly than not.
  • New vertical applications will probably not be optimized for performance when first fielded, but they will improve rapidly as the verticals are exercised and the system responses are used to train the assistant.
  • the present system has basic functionality immediately, when used with the second domain based on the received specification, without needing any additional training data. Rather, the system, using what it has learned from other domains or applications, can interpret the received specification provided by the developer to immediately provide functionality not present on other platforms.
  • Vertical applications are those that are common enough that the platform includes specialized functionality that has been developed by the platform's own developers. For instance, the platform may contain pre-built functionality for banking or booking flights.
  • mapping of one vertical database to another might be automated if the two data schema are similar, possibly with human assistance. That is, if the developer has a schema for a banking database, and the Automated Assistant already has a schema for a banking database, it should be very simple to automatically or manually mark corresponding elements, fields, and operations from the two databases. It will then be necessary to fill in missing operations and entities in the schema to fully coordinate the two vertical applications.
  • Machine learning may be used to modify the system's internal parameters and learn appropriate weights so that the system will more likely respond correctly than not. New vertical applications will probably not be optimized for performance when first fielded, but they will improve rapidly as the verticals are exercised and the system responses are used to train the assistant.
  • FIGURE 3 is a block diagram of an automated assistant that receives data for new domain. As illustrated in FIGURE 3, paraphrase module 222 may be may access and
  • AutoCorrect module 224 may access and communicate with recognizers, lexical hints and generation templates to perform auto-correct tasks on portions of a parsed utterance.
  • Interference mechanism 242 may access and communicate with schema, invariants, and constraint modules to search databases and interact with users while
  • Dialogue pattern module 244 may access and communicate with schema while communicating with dialogue manager 240.
  • An output is generated by generator 240, which receives information from generation templates 327.
  • a developer can define all or some of: schemas (the types of objects in the domain along with their properties), recognizers (which identify objects referred to in an utterance), resolvers (used for searching for domain objects based on constraints on their properties), actions (used for performing some external action such as booking a flight or deleting an email), invariants (used to assert relationships that are usually or always true), and constraints and/or preferences (used for restricting or ranking objects).
  • schemas the types of objects in the domain along with their properties
  • recognizers which identify objects referred to in an utterance
  • resolvers used for searching for domain objects based on constraints on their properties
  • actions used for performing some external action such as booking a flight or deleting an email
  • invariants used to assert relationships that are usually or always true
  • constraints and/or preferences used for restricting or ranking objects.
  • a seed set of trigger words or phrases can be used with these elements, though they may also be automatically induced from data.
  • Schemas define the objects in the domain and the relationships between those objects. If developing a new movie recommendation domain, the developer would have to specify schemas for movies, directors, actors, ratings, etc. Schemas have properties, which are named fields with some type. (For instance, movies have ratings, directors, stars, durations, etc.). Many properties are likely to be based on predefined types (integers, strings, times, locations, etc.) or types from other domains.
  • the platform may be able to automatically import these schemas from a pre-existing database schema or another source by providing a function for automatically mapping between e.g. a SQL database and the platform's internal representation.
  • a SQL database e.g. a database that is mapped to a database that is mapped to a database.
  • the developer can define them in the platform's own internal representation directly.
  • Schemas enable many features of the system, such as question answering. By defining the schema for movies, the system can automatically answer questions like "How long is that movie?" by searching for an appropriate property with type "duration.”
  • Certain object types may also have special functionality called traits that should be indicated to the platform. Examples include:
  • difference between them, including types like money and timestamps.
  • the difference may not be the same type as the original objects. (For instance, the difference of two timestamps is a duration.)
  • the developer may also provide a recognizer for certain types. Recognizers identify entities in language, e.g. from a user utterance. For instance in the movie domain, they might have to provide a recognizer for ratings or actors. Recognizers can be specified in one of several ways, including but not limited to:
  • a simple keyphrase recognizer e.g. recognizing ratings like "PG"
  • Recognizers are used in the semantic parser of the system. Given a recognizer, the system can provide a spelling corrector and a paraphrase module. These modules automatically extend the system in such a way that it can learn that "Nwe York” means “New York”, just as “Big Apple” does. In addition, for non-exhaustive enumerations, the system may attempt to automatically learn new examples, either through future user interactions or through other sources of data.
  • the platform uses developer-provided resolvers, invariants, preferences, and constraints as modules in the assistant's underlying inference mechanism, which is essentially a weighted CSP solver (Reference), augmented with machine-learning or deep-learning modules to improve inference.
  • the system can orchestrate these modules together with its own general purpose inference engine to compose sophisticated queries that no individual API supports.
  • resolver which converts the system query language (provided by the generic Automated Assistant) into API calls used to find entities that match the constraints in the query.
  • system query language provided by the generic Automated Assistant
  • resolver Given a query, a resolver should respond with one of the following:
  • Developers can also provide a declarative specification of the required properties and acceptable constraints on those properties that a given resolver needs or is capable of using. For instance, an API for flight search may require specific departure dates for all legs of the flight, and an optional upper bound on the cost of the itinerary.
  • resolvers need not ensure that all returned results satisfy all constraints. Instead, the underlying system can automatically apply all constraints post-hoc by filtering them. However, the resolver should specify which constraints in the query were used, so that the system can track when a set of results may become invalidated. (In some embodiments, the underlying platform API may be able to track this information automatically by recording which properties were accessed.)
  • Resolver may also be created for Constraints (discussed below) that may need to query an external database or perform compute in order for the system to evaluate it. For instance, an "place X near Y" constraint may need to query a mapping API in order to determine travel time between the two places.
  • Resolvers are used in the Planning module to determine the domains of variables and the extensions of constraints in the Inference module.
  • Invariants The developer may specify invariants that must be true or usually are true about objects in the domain. For instance, the developer may specify that the departure time of a outgoing flight is always before that of the returning flight. Invariants may either be declared explicitly in the specification, or they may be returned as an error by a resolver. They may also be hard, or soft.
  • invariants may be specified in one of two ways: either through a formal language, or through natural language.
  • formal language examples include:
  • Examples of natural language invariants include:
  • the system can avoid interpreting "returning on the 3rd" as February 3rd if the trip begins on February 20th. Invariants are also used in the Inference system. Or the planner may assume that the user's departure point is near their current location rather than asking the user.
  • constraints e.g. inequality, substring matching, location-contained-within-region
  • the developer may provide their own domain-specific constraints. Constraints operate similarly to resolvers, in that they are invoked by the inference mechanism and may make external calls. However, instead of returning a set of values for a single variable, they instead may either return a joint representation of all possible satisfying combinations of arguments, or they may filter or restrict the domains of the existing arguments. As a special case, they must be able to say whether or not a given configuration of arguments is permissible.
  • Functions are just a special class of constraints that are computable based on all but one argument. These may be specially represented in the platform API.
  • the system may treat these cost curves as an a priori guess as to the shape of the curve.
  • the system may learn more precise or user-specific variants of these curves, via, e.g., adjusting the control points of a piecewise composition, or by adjusting the height of the shape, or changing the shape of the curve altogether for another more suitable shape. It does so via using the same learning regimes described in 15/298475, The Attentive Assistant and 15/328448 Interaction Assistant patents.
  • the amount of freedom the system has (if any) in the changes it makes may be set by the developer as well.
  • the system uses the parameters describing these curves as inputs along with constraint's arguments and possibly other contextual information (e.g. the context the soft constraint is instantiated in, and/or a user-specific configuration vector) to a feed-forward neural network that is trained to estimate the cost of those arguments directly.
  • the system is also able to use (and the developer is able to provide) constraints that may be violated for some cost and that may have differing degrees of violation (such as "in the morning”). These violable constraints are called preferences.
  • the system has a pre-defined (but optionally expandable) library of preference "shapes,” ranging in complexity and power. The most basic shape is of course the “is the condition precisely true” constraint, which has some fixed cost if it is false and no cost if it is true.
  • the soft constraint "morning" may be represented as a cost curve with a low or zero cost between the hours of, say, 6am and noon, rising slowly from 6am back to 3am, and perhaps quite quickly after noon. (These may be stitched together from simpler curves and a "piecewise" compositor.) Still other constraints may be available for string types (e.g. approximate matches like edit distance) or other types.
  • constraint complexes are collections of constraints or preferences that typically go together. The developer may describe constraint complexes that bundle together several (hard or soft) constraints. These complexes may be defined and named using natural or formal language, similar to individual properties or constraints. For example:
  • a "round-trip" itinerary is one with exactly two legs, where the first leaves from the same airport the second arrives at, and vice versa.
  • a "honeymoon” usually entails a romantic hotel, a larger room type, and airfare for two.
  • Constraints, functions, and preferences are used in the system's Planning and Inference modules.
  • In the inference module they are used as the constraints or factors in the underlying (soft) constraint satisfaction problem.
  • In the planning module they are used to generate proposals involving system initiative and repair. (E.g., the system can ask "is this a round-trip?" in lieu of asking how many legs in the flight there are.)
  • Tasks such as playing a song or purchase a camera are achieved through actions.
  • the tasks are encoded as actions in the present system, and are the concept in the platform most similar to intents in traditional NLI platforms.
  • An action is simply a function (or API call) that takes certain arguments, some of which may be optional.
  • the developer may also differentiate between actions which commit resources outside the Automated Assistant system, and those which are system internal. External actions (like booking a flight) are difficult to correct, and some may commit the user to spend resources which may be hard to recover, like money or votes. Internal actions tend to be easily correctable, and do not generally commit the user's resources. Thus actions may be graded from “dangerous" to "trivial", and the business rules for an application should indicate the impact to the user.
  • One way the system may use the "danger level" of an action is to require explicit confirmation for dangerous actions (such as booking a flight), while allowing trivial interactions without confirmation (such as playing a song).
  • Actions are used in the system's Planning module and in the Services module, which may be implemented within a dialogue manager module.
  • Language hints are used both by the system's semantic parser and the system's generation module. They form the basis of the domain's grammar by hinting to the system that a given sequence of words is associated with a concept in the domain. (For the parser, they can be simply used as "trigger phrases" for the relevant concepts.) In addition, the system can generalize based on these hints by feeding them to a paraphrase module together with user utterances.
  • the developer may associate each entity type with "nominal" words or phrases, used for referring to generic objects of that type. For instance, they may say that an itinerary might be referred to as a “flight” or “trip”, or that a movie might be referred to as a “movie”, “flick” or “show”.
  • Language hints may also be provided to aid identification of actions, properties, and constraints. For instance, the developer may associate the "Bookltinerary” action with words like “book”, “buy”, “reserve”, or “request”. Or they may say that the first leg of an itinerary is the "outgoing" itinerary.
  • the system also needs to know how to describe existing objects or searches for objects. While this may be learnable from data, it is also usually possible to explain to the system how to refer to an object of a given domain. As an example, consider the following example for movies:
  • generation templates may also be used to aid in semantic parsing and interpretation.
  • the above template tells the system that "directed by" is associated with the director property.
  • Example utterances from the new domain may be listed with the appropriate system actions, and possibly with the updated query that should be created after each utterance or sequence of utterances.
  • hints for schema properties may be inferred automatically from the name of the property.
  • a field named outgoingDepartureTime easily maps to "outgoing departure time," just as a field named “movie_rating” maps to "movie rating”.
  • other possible paraphrases may be provided by the user.
  • the system may have more trouble automatically inferring a natural language description of a field.
  • scrnSizeDiag a field indicating the diagonal length of a television screen— may be outside of the platform's automated inferential capacity. In this case, the developer will have to provide a mapping to a phrase like "screen size", which the system understands.
  • Language hints for actions, constraints, preferences, and constraint complexes can be derived similarly.
  • the action named “Bookltinerary” may be analyzed to accept “book” as a trigger.
  • a high-level card in the present system is roughly equivalent to an "intention" of the user of the system.
  • a card at the highest level of the "book a flight” application understands that it must know a departure and arrival airport, rough times for flights, class of service, number of passengers, and constraints on connectivity and number of passengers. There may also be softer constraints on price, time, type of aircraft, and company which is flying the aircraft.
  • the user In the card for "transferring funds” the user (with the help of the system) must specify the bank, the account in which funds now exist, the account into which the funds must go, the amount, and the rough time of the transfer.
  • Optional constraints may be on the currency of the transferred funds, and the detailed time at which the transfer should be accomplished.
  • the user of the system must either indicate the category and identity of each required entity for the card, must accept the elements which are filled in automatically, or it must interact with the system to fill in all required fields before the card is executed.
  • FIGURE 4 is a method for providing an interactive automated assistant in multiple domains.
  • Training is conducted in a first domain for an automated assistant at step 410.
  • the training may include training data for a first domain and may result in generating and/or learning of specification features such as schema, resolvers, invariants, constraints, recognizers, lexical hints, generation templates, and other data that define a language structure for the first domain.
  • the first specification data for the first domain is stored at step 420.
  • a specification may then be received by the automated assistant for a second domain at step 430.
  • the specification may be received from a developer, may be automatically generated by the automated assistant, or a combination of the two. More details for receiving a specification for a second domain is discussed with respect to the method of FIGURE 5.
  • An interactive automated assistant is provided using the first specification and the second specification at step 440.
  • Providing the interactive automated assistant may include applying features from the first domain and new features from the second domain. This is discussed in more detail with respect to the method of FIGURE 6.
  • FIGURE 5 is a method for receiving a specification for a second domain.
  • the method of FIGURE 5 provides more detail for step 430 of the method of FIGURE 4.
  • schema data for a second domain is received at step 510.
  • the schema data may include properties having named fields with a type.
  • the schema may be automatically imported and may automatically be mapped to schema for a first domain.
  • Recognizer data may be received at step 520. Recognizer data may identify entities within language. In some instances, the recognizers may be used in a semantic parser, for example in a auto-correct module and/or with a paraphrase module.
  • Action data may be received at step 560.
  • An action is a function that takes a certain argument and may commit external resources or internal resources.
  • External actions can be difficult to correct, in some instances, and may be graded according to a level of risk they may pose. For example, external actions that perform an action with an external service may be graded as dangerous while an action that performs internal task may be graded as trivial.
  • selected grades for all grades of actions may be reported to a user before they are performed.
  • Lexical hints may be received at step 570 and generation templates may be received at step 580.
  • Training snippets for the second domain may be received at step 590. Receiving training snippets along with other specification data rather than a full training set of data allows the present system to get up and running much more quickly on a second domain than systems that would require a full training data sets.
  • Agent behavior parameters may be received at step 595.
  • Agent behavior parameters may indicate the formality of generated language, the size of a response, and other aspects. Developers can specify the personality of the automated assistant for a particular application through the behavior parameters.
  • a partial list of potential agent characteristics includes:
  • the automated assistant can have some or all of these characteristics already coded.
  • the developer can choose a completed personality or can assist in coding a new one or provide data exemplifying the new required personality for either manual or machine learning implementation.
  • FIGURE 6 is a method for providing an interactive automated assistant using it for specification and a second specification.
  • the method of FIGURE 6 provides more detail for step 440 the method of FIGURE 4.
  • First, general features from the first domain are applied to the automated assistant and second domain specification at step 610.
  • New features from the received specification are then applied to the second domain at step 620.
  • Interactive dialogue may be performed for the user in a second domain using the received specification and the features of the first domain at step 630.
  • the Planning module is tasked with determining how the system will go about conducting the dialog and servicing the user's request(s). Roughly speaking, it looks at the dialog state (which includes a representation of the user's intent) and figures out what to do next, and what components of the system will service it.
  • the system can be implemented as a top-down weighted deduction system, where the system has a set of postconditions to satisfy as a goal (usually produced by external events including user utterances and internal actions), and the system uses deduction operators to find a solution.
  • Goals include:
  • Handle(event) or Handle(utterance) process an incoming event
  • Update(expr) Update the system's internal representation using the results of, e.g., a parse.
  • Propose(expr) Propose instantiating a card, a constraint or a constraint template, etc.
  • Weights are learned automatically via the various learning mechanisms described elsewhere. The features involved look at operator identity, properties of the goal(s) including developer hints, the overall dialog state, user preferences, etc.
  • the system may also propose various execution strategies to the user, asking the user to confirm an option or to choose amongst options. These proposals might be to ask if the user wants to, e.g., find a hotel before finding a flight itinerary, or it could be used to "sanity check" an execution plan ("Do you really want to search for any airport in Australia?").
  • the user's response can be used as training data in much the same way as it is used for the rest of the dialog system: the system can assign features to its proposals and learn which one the user likely should take.
  • the system may take into account the costs of inference itself in addition to (its estimate of) the user's preference.
  • certain execution strategies may be especially expensive from a computational, latency, or even financial point of view: some inference algorithms may be especially intractable, or may require accessing a slow database, or cost money for each query executed.
  • the system can be configured to weigh these factors to decide which is appropriate in the circumstance, or to learn to maximize some utility function (e.g., the expected value of the user's eventual purchase). This idea is similar to that of
  • JSON specification for a domain for searching a movie database.
  • Dialogs patterns including multiple strategies for gathering requirements, conducting searches, selecting options, error handling, and executing actions;
  • the exemplary code defines a domain with two domain-specific types: movies and ratings.
  • the developer has provided custom triggers (language hints) for the "title” property, and for the lead actor, present in the first entry in the actors field. Nominals are used as language hints for referring to an individual movie. These are used for intent detection and for recognizing and generating referring expressions.
  • the developer has also provided a generation template for describing a movie. Using this generation template, the system can describe movies as, e.g., "Indiana Jones and the Temple of Doom (1984) directed by Steven Spielberg starring Harrison Ford". The system can also use this template to describe movies that have not yet been found, e.g. the system can describe a search for "movies starring Harrison Ford” using this same template. There is also a description of the requirements for the resolver for movies.
  • the rating type by contrast is an enumerated type, with all possible ratings specified inline in the specification.
  • the enumeration is used as a recognizer.
  • the developer has not specified any language hints for this type, and so triggers inferred from the specification's field names and default generation are used until better behavior can be learned from data.
  • FIGURE 7 is a block diagram of a computing environment for implementing the present technology.
  • System 700 of FIGURE 7 may be implemented in the contexts of the likes of client 110, mobile device 120, computing device 130, network server 150, application server 160, and data stores 170.
  • the computing system 700 of FIGURE 7 includes one or more processors 710 and memory 720.
  • Main memory 720 stores, in part, instructions and data for execution by processor 710.
  • Main memory 710 can store the executable code when in operation.
  • the system 700 of FIGURE 7 further includes a mass storage device 730, portable storage medium drive(s) 740, output devices 750, user input devices 760, a graphics display 770, and peripheral devices 780.
  • FIGURE 7 The components shown in FIGURE 7 are depicted as being connected via a single bus 790. However, the components may be connected through one or more data transport means.
  • processor unit 710 and main memory 720 may be connected via a local microprocessor bus
  • the mass storage device 730, peripheral device(s) 780, portable or remote storage device 740, and display system 770 may be connected via one or more input/output (I/O) buses.
  • Mass storage device 730 which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 710. Mass storage device 730 can store the system software for implementing embodiments of the present invention for purposes of loading that software into main memory 720.
  • Portable storage device 740 operates in conjunction with a portable non-volatile storage medium, such as a compact disk, digital video disk, magnetic disk, flash storage, etc. to input and output data and code to and from the computer system 700 of FIGURE 7.
  • a portable non-volatile storage medium such as a compact disk, digital video disk, magnetic disk, flash storage, etc.
  • the system software for implementing embodiments of the present invention may be stored on such a portable medium and input to the computer system 700 via the portable storage device 740.
  • Input devices 760 provide a portion of a user interface.
  • Input devices 760 may include an alpha-numeric keypad, such as a keyboard, for inputting alpha-numeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys.
  • the system 700 as shown in FIGURE 7 includes output devices 750. Examples of suitable output devices include speakers, printers, network interfaces, and monitors.
  • Display system 770 may include a liquid crystal display (LCD), LED display, touch display, or other suitable display device.
  • Display system 770 receives textual and graphical information and processes the information for output to the display device.
  • Display system may receive input through a touch display and transmit the received input for storage or further processing.
  • Peripherals 780 may include any type of computer support device to add additional functionality to the computer system.
  • peripheral device (s) 780 may include a modem or a router.
  • the components contained in the computer system 700 of FIGURE 7 can include a personal computer, hand held computing device, tablet computer, telephone, mobile computing device, workstation, server, minicomputer, mainframe computer, or any other computing device.
  • the computer can also include different bus configurations, networked platforms, multi-processor platforms, etc.
  • Various operating systems can be used including Unix, Linux, Windows, Apple OS or iOS, Android, and other suitable operating systems, including mobile versions.
  • the computer system 700 of FIGURE 7 may include one or more antennas, radios, and other circuitry for communicating via wireless signals, such as for example communication using Wi-Fi, cellular, or other wireless signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
EP18761097.7A 2017-03-02 2018-03-02 Entwicklerplattform zur bereitstellung eines automatisierten assistenten in neuen domänen Withdrawn EP3590050A4 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762465979P 2017-03-02 2017-03-02
PCT/US2018/020784 WO2018161048A1 (en) 2017-03-02 2018-03-02 Developer platform for providing automated assistant in new domains

Publications (2)

Publication Number Publication Date
EP3590050A1 true EP3590050A1 (de) 2020-01-08
EP3590050A4 EP3590050A4 (de) 2021-01-20

Family

ID=68428123

Family Applications (1)

Application Number Title Priority Date Filing Date
EP18761097.7A Withdrawn EP3590050A4 (de) 2017-03-02 2018-03-02 Entwicklerplattform zur bereitstellung eines automatisierten assistenten in neuen domänen

Country Status (2)

Country Link
EP (1) EP3590050A4 (de)
CN (1) CN110447026B (de)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11797820B2 (en) * 2019-12-05 2023-10-24 International Business Machines Corporation Data augmented training of reinforcement learning software agent
US11748128B2 (en) 2019-12-05 2023-09-05 International Business Machines Corporation Flexible artificial intelligence agent infrastructure for adapting processing of a shell

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103221952B (zh) * 2010-09-24 2016-01-20 国际商业机器公司 词法答案类型置信度估计和应用的方法和系统
US9875494B2 (en) * 2013-04-16 2018-01-23 Sri International Using intents to analyze and personalize a user's dialog experience with a virtual personal assistant
US9081411B2 (en) * 2013-05-10 2015-07-14 Sri International Rapid development of virtual personal assistant applications
EP3005150A4 (de) * 2013-06-07 2016-06-15 Apple Inc Intelligenter automatisierter assistent
US11062228B2 (en) * 2015-07-06 2021-07-13 Microsoft Technoiogy Licensing, LLC Transfer learning techniques for disparate label sets

Also Published As

Publication number Publication date
CN110447026A (zh) 2019-11-12
CN110447026B (zh) 2023-07-11
EP3590050A4 (de) 2021-01-20

Similar Documents

Publication Publication Date Title
US11430443B2 (en) Developer platform for providing automated assistant in new domains
US10755713B2 (en) Generic virtual personal assistant platform
US9990591B2 (en) Automated assistant invocation of appropriate agent
US20180285595A1 (en) Virtual agent for the retrieval and analysis of information
CN114424185A (zh) 用于自然语言处理的停用词数据扩充
US11868727B2 (en) Context tag integration with named entity recognition models
JP2023530423A (ja) 堅牢な固有表現認識のためのチャットボットにおけるエンティティレベルデータ拡張
US10713288B2 (en) Natural language content generator
CN112487157A (zh) 用于聊天机器人的基于模板的意图分类
US10977155B1 (en) System for providing autonomous discovery of field or navigation constraints
CN115398436A (zh) 用于自然语言处理的噪声数据扩充
US20180308481A1 (en) Automated assistant data flow
CN116635862A (zh) 用于自然语言处理的域外数据扩充
US20230205999A1 (en) Gazetteer integration for neural named entity recognition
CN116615727A (zh) 用于自然语言处理的关键词数据扩充工具
Pan et al. Automatically generating and improving voice command interface from operation sequences on smartphones
CN116547676A (zh) 用于自然语言处理的增强型logit
CN110447026B (zh) 用于在新的域中提供自动化助理的开发人员平台
EP4281880A1 (de) Ausgleich mehrerer merkmale für prozessoren mit natürlicher sprache
US20240062108A1 (en) Techniques for training and deploying a named entity recognition model
US20240169161A1 (en) Automating large-scale data collection
US20240126795A1 (en) Conversational document question answering
Kuzmin Kentico Voice Interface (KEVIN)

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20190827

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

RIN1 Information on inventor provided before grant (corrected)

Inventor name: BURKETT, DAVID, ERNESTO, HEEKIN

Inventor name: ROTH, DANIEL, LAWRENCE

Inventor name: KLEIN, DANIEL

Inventor name: HALL, DAVID LEO WRIGHT

Inventor name: COHEN, JORDAN, RIAN

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC

A4 Supplementary search report drawn up and despatched

Effective date: 20201218

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 40/35 20200101ALI20201214BHEP

Ipc: G06F 40/295 20200101AFI20201214BHEP

Ipc: G06F 40/56 20200101ALI20201214BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20210709