CN115392217A

CN115392217A - Techniques for preserving pruning flows

Info

Publication number: CN115392217A
Application number: CN202210451489.9A
Authority: CN
Inventors: B·加利茨基
Original assignee: Oracle International Corp
Current assignee: Oracle International Corp
Priority date: 2021-04-26
Filing date: 2022-04-26
Publication date: 2022-11-25

Abstract

The present disclosure relates to techniques for maintaining a pruned stream. The disclosed systems, devices, and methods improve dialog management through the use of dialog utterance trees (DDTs). To determine whether a Candidate Response (CR) is appropriate in a conversation, the CR may be added to other speech that has previously been provided in the conversation between the parties, and a conversational utterance tree (DDT) may be generated from the result. The DDT includes nodes corresponding to basic speech units (EDUs) that represent text segments of speech and CR. The DDT may include nodes indicating a lexicographic relationship between EDUs. In some embodiments, the DDT includes nodes representing at least one dialog-specific lexicographic relationship between two utterances. The DDT of the CR can be provided to a machine learning model that has been trained to identify whether a lexical flow between the utterances of the DDT is preserved. If so, the CR may be provided to respond to the request.

Description

Techniques for preserving a pruned stream

Cross Reference to Related Applications

This application claims benefit of U.S. patent application No. 17/725,496, filed on 20/4/2022, which is a continuation-in-part application of U.S. patent application No. 17/071,608, filed on 15/10/2020, which is a continuation of U.S. serial No. 15/975,685, filed on 9/5/2018, which claims benefit of U.S. provisional application No. 62/504,377, filed on 10/5/2017, which is incorporated by reference for all purposes. This application also claims the benefit of U.S. provisional application No. 63/179,926, filed on 26/4/2021, which is incorporated by reference for all purposes.

Technical Field

The present disclosure relates generally to linguistics. More particularly, the present disclosure relates to generating and utilizing a dialog (dialog) utterance (dispatch) tree (DDT) to represent dialogs and to manage a lexicographic flow. The DDTs discussed herein are generated based at least in part on historical speech (utterance) and are capable of maintaining appropriate dialog logic in a dialog.

Background

Linguistics is a scientific study of language. One aspect of linguistics is the application of computer science to human natural languages, such as english. Linguistic computer applications are emerging due to the dramatic increase in processor speed and memory capacity. For example, computer-enabled linguistic utterance analysis facilitates many applications, such as automated agents that can answer questions from users. It is becoming increasingly popular to use "chat robots" and agents to answer questions, facilitate discussions, manage conversations, and provide social promotions. To meet this demand, a wide range of techniques have been developed including combinatorial semantics. This technique can support automated brokering in the case of simple, short queries and replies.

These solutions, however, fail to utilize rich speech-related information to answer questions, perform conversation management, provide suggestions, or implement "chat bot" systems because existing solutions fail to successfully select speech in a conversation that maintains proper conversation logic from a collection of appropriate topics, but rather select verbally incoherent speech. Therefore, new solutions are needed to accurately express the thesaurus correspondence between questions and answers.

Disclosure of Invention

In general, the systems, devices, and methods of the present invention are related to utilizing a dialog utterance tree to manage dialogs.

A computer-implemented method for dialog management using a dialog utterance tree is disclosed. The computer-implemented method may include receiving a request from a user device, the request including speech of a conversation between two entities. The computer-implemented method can include generating a conversation instance based at least in part on merging a plurality of utterances previously provided by either of two entities. In some embodiments, the plurality of utterances comprises requested utterances. The computer-implemented method may include identifying a set of candidate responses for the requested utterance from a corpus of candidate responses. The computer-implemented method may include generating a dialog instance and a dialog utterance tree of candidate responses for a candidate response in a set of candidate responses. In some embodiments, the conversational utterance tree includes nodes corresponding to base utterance units that represent text segments of a plurality of utterances and candidate responses, at least one non-terminating node of the nodes of the conversational utterance tree represents a lexical relationship between two base utterance units, and each terminating node of the nodes of the conversational utterance tree is associated with a base utterance unit. In some embodiments, the conversational utterance tree includes nodes that represent at least one conversational-specific lexicographic relationship between two utterances of a conversational instance. The computer-implemented method may include classifying a conversational utterance tree of the candidate response using a first machine learning model that was previously trained using supervised learning techniques and a training data set including a plurality of conversational utterance trees that were previously labeled as valid or invalid; the computer-implemented method may include providing a candidate response in response to a request based at least in part on classifying a conversational utterance tree of the candidate response.

In some embodiments, identifying the set of candidate responses from the candidate response corpus further comprises: 1) determine a first communication utterance tree including a question root node for the requested utterance, 2) determine a second communication utterance tree for the candidate response, wherein the second communication utterance tree includes an answer root node, 3) in response to determining that the question root node and the answer root node are the same, 4) merge the first communication utterance tree and the second communication utterance tree to form a merged communication utterance tree, 5) calculate a level of complementarity between the first communication utterance tree and the second communication utterance tree by providing the merged communication utterance tree to a second machine learning model, the second machine learning model having been previously trained to determine a level of complementarity of sub-trees of the two communication utterance trees, and 6) identify the requested utterance and the candidate response as being complementary in response to determining that the level of complementarity is above a threshold.

In some embodiments, classifying the conversational utterance tree includes classifying the conversational utterance tree of the candidate response as valid or invalid, wherein a valid classification indicates that an appropriate lexicography flow between utterances corresponding to the conversational utterance tree is preserved, and wherein an invalid classification indicates that an appropriate lexicography flow between utterances corresponding to the conversational utterance tree is disrupted.

In some embodiments, the computer-implemented method includes generating a training data set for a first machine learning model based at least in part on generating a plurality of conversation instances from a corpus of documents. Generating the conversation instance from the document may further include: 1) split an input text of a document into a set of text segments, 2) establish an alternating utterance tree for a text segment of the set of text segments, 3) identify a set of ancillary base utterance units of the alternating utterance tree for the text segment, 4) select an entity or attribute from the ancillary base utterance units, 5) generate a query according to the entity or attribute selected from the ancillary base utterance units, 6) execute the query on a knowledge base, 7) generate a question corresponding to the ancillary base utterance units based at least in part on one or more search results obtained by executing the query, 8) update the alternating utterance tree based at least in part on inserting the question as a new node that is inserted based at least in part on the ancillary base utterance units, and 9) generate a conversation instance using the updated alternating utterance tree or any suitable combination of the foregoing.

The computer-implemented method may include generating a second conversation utterance tree based at least in part on the conversation instance. The second dialogue utterance tree can be a second node corresponding to a second base utterance unit that represents a second text passage of the dialogue instance, at least one non-terminating node of the nodes of the second dialogue utterance tree represents a respective lexical relationship between the two base utterance units, and each terminating node of the nodes of the second dialogue utterance tree is associated with a base utterance unit. In some embodiments, the second spoken utterance tree includes nodes that represent at least one dialog-specific thesaurus relationship between two utterances of the second spoken utterance tree. The computer-implemented method may include associating the second dialogue utterance tree with a label that indicates the second dialogue utterance tree is valid, and storing the second dialogue utterance tree and the label as part of a training data set used to train the first machine learning model.

In some embodiments, generating a conversational utterance tree may include 1) generating an utterance tree for the candidate response, the utterance tree including a set of nodes, each non-terminating node of the set of nodes of the utterance tree representing a corresponding lexical relationship between two base utterance units, and each terminating node of the set of nodes of the utterance tree being associated with a particular base utterance unit, 2) identifying a lexical association of a set-up or connection type in the utterance tree, wherein the lexical association relates to a first base utterance unit and a second base utterance unit, and wherein the first base utterance unit and the second base utterance unit form a reference sentence, 3) identifying an abstract meaning representation of a template based at least in part on identifying one or more common entities between the utterance tree utterance and the abstract meaning representation of the template, 4) identifying a semantic association corresponding to the lexical association, wherein the semantic association corresponds to a word of the template, and 5) replacing, in the utterance tree, the semantic association corresponding to the updated lexical association of the lexical association, or any suitable combination thereof.

In some embodiments, providing the candidate response as part of the conversation and responding to the request is further based at least in part on determining that the candidate response is thematically related to the utterance of the request.

The example method(s) discussed herein may be implemented on a system and/or device including one or more processors and/or stored as instructions on a non-transitory computer-readable medium. Various aspects of the disclosure may be implemented using a computer program product comprising computer programs/instructions that, when executed by a processor, cause the processor to perform any of the methods disclosed herein.

Drawings

FIG. 1 illustrates an exemplary conversation classification environment in accordance with an aspect.

Fig. 2 depicts an example of a utterance tree according to an aspect.

Fig. 3 depicts a further utterance tree example in accordance with an aspect.

FIG. 4 depicts an illustrative schema in accordance with an aspect.

FIG. 5 depicts a node link representation of a hierarchical binary tree in accordance with an aspect.

FIG. 6 depicts an exemplary indented text encoding of the representation in FIG. 5 according to an aspect.

FIG. 7 depicts an exemplary DT of an example request for property taxes, according to one aspect.

FIG. 8 depicts an exemplary response to the question represented in FIG. 7.

FIG. 9 illustrates a pronunciation tree for official answers according to an aspect.

FIG. 10 illustrates a pronunciation tree of an original answer in accordance with an aspect.

Fig. 11 illustrates a claimed exchange utterance tree of a first agent, according to an aspect.

FIG. 12 illustrates a claimed exchange utterance tree of a second agent in accordance with an aspect.

FIG. 13 illustrates a claimed exchange utterance tree of a third agent in accordance with an aspect.

FIG. 14 illustrates a parse tree in accordance with an aspect.

FIG. 15 illustrates an exemplary process for building a communication utterance tree according to an aspect.

Fig. 16 illustrates a speech tree and a scene graph in accordance with an aspect.

FIG. 17 illustrates forming request-response pairs in accordance with an aspect.

FIG. 18 illustrates a maximum common sub-exchange utterance tree in accordance with an aspect.

FIG. 19 illustrates a tree of a core learning format of a communication utterance tree, according to an aspect.

FIG. 20 illustrates an exemplary process for implementing a pruned consistency classifier in accordance with an aspect.

FIG. 21 illustrates an exemplary process for dialog management using a dialog utterance tree (DDT), according to at least one embodiment.

FIG. 22 illustrates an exemplary process for generating a conversation instance in accordance with at least one embodiment.

FIG. 23 illustrates a communication utterance tree (CDT) that can be used to generate a DDT in accordance with at least one embodiment.

Fig. 24 illustrates an exemplary utterance tree generated from text (e.g., text 128 of fig. 1, a conversation instance, a conversation formed by merging multiple utterances between a chat robot and a user), in accordance with at least one embodiment.

FIG. 25 illustrates an exemplary dialog utterance tree generated according to the utterance tree of FIG. 24, according to at least one embodiment.

FIG. 26 illustrates an exemplary dialog utterance tree generated for a dialog in accordance with at least one embodiment.

FIG. 27 depicts a utterance tree and a semantic tree in accordance with an aspect.

FIG. 28 depicts a utterance tree and a semantic tree in accordance with an aspect.

Fig. 29 is a flow diagram of an exemplary process for generating a dialog utterance tree, according to an aspect.

FIG. 30 depicts generalization of sentences and templates with known semantic associations according to an aspect

FIG. 31 depicts alignment between two sentences according to an aspect.

FIG. 32 is a flow diagram depicting an exemplary process for generating a dialog utterance tree corresponding to a candidate response in accordance with at least one embodiment.

FIG. 33 is a flow diagram depicting an exemplary process for dialog management using a dialog utterance tree (DDT), according to at least one embodiment.

FIG. 34 depicts a simplified diagram of a distributed system for implementing one of the aspects.

Fig. 35 is a simplified block diagram of components of a system environment in accordance with an aspect through which services provided by components of a system of an aspect can be provided as cloud services.

FIG. 36 illustrates an exemplary computer system in which aspects of the invention may be implemented.

Detailed Description

Aspects disclosed herein provide technical improvements to the field of computer-implemented linguistics. More particularly, the present disclosure relates to generating and utilizing a conversational utterance tree (DDT) to represent a conversation and manage a lexicography flow. The DDTs discussed herein are based at least in part on historical speech generation and are capable of maintaining appropriate dialog logic in a dialog.

Chat robots have received a great deal of attention from academic researchers and have enjoyed significant success in a number of industrial scenarios, such as chat machines, information search and retrieval, and intelligent assistants. Social dialog systems are becoming robust and reliable and are widely used for dialogues with humans. In recent years, this progress has been driven primarily by advances in neural generation, which can process a wide variety of user speech and provide meaningful chat robot responses. Users want their interactions with these conversation agents to resemble real social relationships.

Modern methods of dialog systems can be divided into two categories: 1) Domain specific and 2) open domain. Domain-specific models often seek to address and accomplish a specific goal (e.g., restaurant reservation, traffic or social promotions, etc.), which depends on domain knowledge and engineering. Open-domain dialogs, in contrast, involve unlimited topics in a conversation. Therefore, it is more challenging to build an open-domain dialog system due to the lack of sufficient knowledge engineering. Building open-field dialog systems with a large number of available dialog data sets has attracted increasing interest in NLP communities. In open-domain dialog systems, generation-based and retrieval-based methods are the mainstream in the industry. At the same time, the generated-based methods learn to create feasible responses for the user-issued query, while the search-based methods extract appropriate responses from a set of available speech candidates.

Unlike "general responses" produced by generative models, search-based methods can extract smooth and informative responses from human conversations. Early search-based approaches mainly addressed the single-turn response selection problem, where only one utterance was contained in the context of the conversation.

The generation-based approach generates responses using natural language generative models learned from conversational data, while the retrieval-based approach reuses existing responses by selecting appropriate responses from the conversational data index. While search-based chat robots have the advantage of returning informative and fluent responses, they focus response selection in single-pass sessions that ignore session history. The invention relates to multi-turn response selection that takes as input a message and the speech of a previous turn and selects a response that is natural and contextually relevant to the entire conversation.

The problem of finding the best speech is at least two-dimensional. In previous data-driven approaches, topic relevance was well handled when both dimensions were processed simultaneously, while the pruning stream was random and uncontrolled. This is because there are considerable levels of abstraction between the syntax level available to the learning system and the utterance level that controls the lexical flow. It is difficult to accumulate enough data to learn the lexical flow from the base syntax level. Maintaining an appropriate dialog flow is a fundamental task of a Dialog Manager (DM). The disclosed technology separates the thesaurus stream from the topic association. A spoken dialog tree (DDT) may be generated to represent utterances of a dialog. DDT can be used to assess the correspondence of condenser words among multiple utterances in a dialog to maintain a stream of condenser words. Using DDT, natural speech may be selected that is contextually relevant to the entire session. The use of these DDTs allows the chat robot to successfully select words in the conversation that maintain the appropriate conversation logic (e.g., topically appropriate, contextually coherent with other words in the conversation).

There are other problems with session management based on end-to-end learning. The lack of control over the neural generation methods makes it difficult to reliably use them to introduce new information in asking or replying to the speech of a user's request. When current chat robots (explicitly providing new factual content) introduce things into a conversation, the responses they generate do not confirm the previous round. The reason for this is as follows: while current methods are trained in two contexts, new factual content and session history, the responses generated by the chat bot are not directed to both content and session history. Chat robots in most cases lack specificity with respect to the conversation history.

Although the deep learning (DP) model is superior to the baseline approach in most cases, there are several major dialogue management issues that are not adequately addressed in terms of logical consistency. The end-to-end model adequately represents context and response at the semantic level, but little attention is paid to logical consistency. In most corpora, this leads to several bad cases. For example, in a conversation history, one of the speakers might say that he believes the merchandise on eBay is false, and the expected answer might be to ask why he dislikes the fake shoes. However, most neural model selection responses are "this is not spurious. I just worry about date of production "and the like. This reaction is logically inconsistent with the context, as it states that the shoe is not fake, which is inconsistent with the context. The reason behind this is that most neural models only deal with the semantics of context-response pairs. When selecting a response, the logic, attitudes and emotions are not taken into account. While most popular methods of filtering candidate items are word-based on a syntactic and semantic level, the disclosed techniques perform the filtering on an overall logical flow, the level of the utterance. This allows, to some extent, the selection of a response that is consistent with the previous utterance in the conversation.

Certain definitions

As used herein, "revising structural theory" is a field of research and learning that provides a theoretical basis by which the coherence of a utterance can be analyzed.

As used herein, a "speech tree" or "DT" refers to a structure of lexical associations of sentences that represent a portion of a sentence.

As used herein, "lexical association," "lexical relationship," or "coherent association" or "speech association" refers to how two segments of a speech are logically connected to each other. Examples of lexical associations include exposition, contrast, and attribution.

As used herein, a "sentence fragment" or "fragment" is a portion of a sentence that can be separated from the rest of the sentence. A segment is a basic speech unit. For example, for the sentence "[ People 1] say that the sight position to [ Organization 1] say that [ Organization 1] should be responsible for [ Event 1]", the two fragments are "[ People 1] say that the sight position to [ Organization 1] and" as meeting priority for [ Event 1] ". A fragment may include, but need not include, a verb.

As used herein, "signature (signature)" or "skeleton" refers to the property of a verb in a fragment. Each signature may include one or more subject roles. For example, for the segment "[ Peope 1]" say present points to [ Organization 1] ", the verb is" say ", and the signature for a particular use of this verb" say "might be" agent verbic ", where" [ Peope 1] "is an agent and" evidence "is a subject.

As used herein, "topic role" refers to a component of a signature that describes the role of one or more words. Continuing with the previous example, "agent" and "topic" are topic roles.

As used herein, "core" refers to which text segment, fragment, or section (span) is more important for the author's purposes. The core component (nucleous) is the more central segment, while the satellite component (satellite) is less central.

As used herein, "consistency" refers to the association of two formulations together.

The "communicating (communicative) speech tree" or CDT includes a speech tree that is supplemented by communicating actions. Communicative actions are cooperative actions taken by individuals on a mutual negotiation and demonstration basis.

As used herein, a "communication verb" is a verb that indicates communication. For example, the verb "deny" is an exchange verb.

As used herein, a "communicative action" describes an action performed by one or more agents and the subject of an agent.

FIG. 1 depicts an exemplary conversation classification environment, according to one aspect. Fig. 1 depicts a computing device 101, a computing device 103, and a conversation 130. Computing devices 101 and 103 may be any suitable computing device, such as a mobile phone, a smart phone, a tablet computer, a laptop computer, a smart watch, and so forth. Computing devices 101 and 103 may communicate via a data network, including any public or private network, wired or wireless network, wide area network, local area network, cellular network, the internet, etc. In some embodiments, the user may provide speech or text at computing device 103 and receive a response from computing device 101. Computing device 101 includes one or more of an application 102, a conversational utterance tree (DDT) generator 104, a DDT processing module 106, an answer database 105, a thesaurus consistency classifier 120, a conversational classifier 122, and training data 125. In some embodiments, application 102 may be an example of a chat robot. Examples of computing devices include

devices

3402, 3404, 3406, and 3408 and cloud infrastructure system 3502, client devices 3504, 3506, respectively depicted in fig. 34 and 35.

In one example, the application 102 may invoke the functionality of a dialog utterance tree (DDT) generator 104 that generates an utterance tree from the dialog 130. The conversation 130 can include any suitable number of utterances of a conversation (e.g., a chat bot conversation in which the user and the chat bot exchange utterances). DDT generator 104 analyzes the utterance tree and generates semantic representations, such as Abstract Meaning Representation (AMR) graphs. AMR is a semantic representation language. An AMR graph is a rooted, labeled, directed Acyclic Graph (DAG) that includes an entire sentence. In accordance with the AMR diagram, using the techniques disclosed herein, DDT generator 104 generates DDTs, which in turn can be used to perform conversational utterance analysis. Examples of processes for creating a DDT are discussed in connection with fig. 24-30. The DDT processing module 106 may identify a plurality of speech candidates to add to the conversation from the answer database 105. The DDT processing module 106 may iterate through the speech candidates, invoking the functionality of the DDT generator 104 to generate a DDT for each candidate that includes the dialog 130 and the speech candidate. Each DDT corresponding to a speech candidate may be classified or otherwise scored. The application 102 may respond with candidate speech selected based on the classification or score.

In another example, application 102 answers questions received via a chat session. The dialog 130 may be a stream of questions and answers. In some embodiments, DDT generator 104 creates a DDT from dialog 130 and selects one or more candidate answers from answer database 105. Any suitable portion of the dialog 130 may be generated by any mobile device, such as a mobile phone, a smart phone, a tablet computer, a laptop computer, a smart watch, and so forth. The mobile device may communicate with the computing device 101 via a data network. In this manner, the mobile device may provide questions (e.g., from the user) to the computing device 101.

Continuing the example, the DDT processing module 106 determines the most appropriate answer from the candidate answers. Different methods may be used. In one aspect, the DDT processing module 106 may create a DDT for each candidate answer, including the dialog 130 and the candidate answer. The DDT processing module 106 may identify the best candidate answer (e.g., answer 132) corresponding to the DDT indicating that the candidate answer most successfully maintained the stream of modifiers for the previous utterance provided in the dialog 130. Application 102 then sends text associated with the selected candidate answer (e.g., answer 132) to the mobile device.

In another example, DDT processing module 106 causes DDT generator 104 to select a candidate answer (e.g., answer 132). The DDT processing module 106 uses the trained thesaurus consensus classifier 120 to determine whether the pairing of a candidate answer (e.g., answer 132) to a previous utterance of the conversation 130 is above a threshold score indicating how relevant the candidate answer is to the topic. If a candidate answer (e.g., answer 132) is deemed to be topic-related, the DDT processing module 106 may generate a DDT for the candidate answer. The DDT may include a dialog 130 and a candidate answer (e.g., answer 132). The DDT processing module 106 provides candidate answers to a predictive model, such as the dialog classifier 122. In some embodiments, a first answer (e.g., answer 132) may be selected that is identified that corresponds to a DDT having a score that exceeds a threshold score. If the first answer does not correspond to a DDT with a sufficient score, the DDT processing module 106 may continue to analyze additional pairs, including questions and different candidate answers, until a suitable answer is found. In other embodiments, multiple candidate answers may be used. The corresponding DDTs for these candidate answers may be scored and the highest scoring DDT may be selected. Application 102 may then provide the selected answer corresponding to the highest scoring DDT (e.g., answer 132) to computing device 103 as part of dialog 130.

Structural theory of revising and expression tree

Linguistics is a scientific study of language. For example, linguistics may include the structure (syntax) of a sentence, such as subject-verb-object; sentence meaning (semantics), e.g., dog versus person dog; and the speaker's behavior in the conversation, i.e., analysis of utterances or analysis of languages other than sentences.

The theoretical basis of utterances (the modified structure Theory (RST)) can be attributed to Mann, william and Thompson, sandra, "Rheometric structure Theory: A Theory of Text organization Theory," Text-Interdisciplicity Journal for the Study of discourses, 8 (3): 243-281,1988. RST helps to achieve utterance analysis in a manner similar to the syntax and semantics of programming language theory that help to achieve modern software compilers. More specifically, RST builds fabric blocks on at least two levels, the first level being fabric or schema, such as core and prune associations, and the second level. The utterance parser or other computer software can parse the text into an utterance tree.

The theory of the structure of the correction and modification models the logical organization of the text and the structure adopted by the author by means of the association among all parts of the text. RST models text coherence by forming a hierarchical, connected text structure through the utterance tree. The retrieval association is divided into a parallel class and a dependent class; these associations span two or more text sections and thus enforce coherence. These text sections are called basic speech units (EDUs). Clauses in sentences and sentences in text are logically connected together by the author. The meaning of a given sentence is related to the meaning of preceding sentences and following sentences. This logical association between clauses is referred to as a coherent structure of text. RST is one of the most popular utterance theories, which is based on a tree-like utterance structure, the utterance tree (DT). The leaves of DT correspond to EDUs, i.e. consecutive atomic text segments. Adjacent EDUs are connected together by a coherent association (e.g., cause, order) to form a higher level speech unit. These units are then also affected by this associative link. Then, the EDUs linked by association are distinguished based on their relative importance: the core component is the associated core part and the adjunct component is the peripheral part. As discussed, to determine accurate request-response pairs, both topic and thesaurus consistency are analyzed. When a speaker answers a question, such as a phrase or sentence, the speaker's answer should be directed to the subject of the question. Where a question is posed implicitly via the seed text of a message, it is expected that the appropriate answer not only preserves the topic, but also matches the generalized cognitive state of the seed.

Thesaurus association

As discussed, aspects described herein use a communication utterance tree. The pruning association may be described in different ways. For example, the following provides twenty possible relationships. Other numbers of associations are possible.

Some empirical studies assume that most text is constructed using core-adjunct associations. See Mann and Thompson. But other associations do not explicitly select the core component. Examples of such associations are shown below.

Association name	Segment of	Other sections
			Comparison of	An alternative	Another alternative
Connection of	(unconstrained)	(unconstrained)
			List(s) of	An item	The next item
Sequence of	An item	The next item

Fig. 2 depicts an example of a utterance tree according to an aspect. Fig. 2 includes a speech tree 200. The utterance tree includes a text section 201, a text section 202, a text section 203, an association 210, and an association 228. The numbers in fig. 2 correspond to three text sections. Fig. 3 corresponds to the following example text with three

text section numbers

1,2, 3:

1.Honolulu, hawaii will be site of the 2017 Conference on Hawaiian History (2017 Hawaii History will be held in Hawaii Sandalwood mountain)

It is expected that 200 historians from the U.S. and Asia will be expected to have 200 historians from the United states and Asia present

The conference with the second connected with the method of the connected to Hawaii (the conference will relate to how the people in the Paini Sai navigate to Hawaii)

For example, association 210 or a statement describes a relationship between text section 201 and text section 202. Association 228 depicts a relationship, i.e., a statement, between text sections 203 and 204. As depicted,

text sections

202 and 203 further illustrate text section 201. In the above example, the text section 1 is a core component in view of informing the reader of the target of the conference.

Text sections

2 and 3 provide more details about the conference. In FIG. 2, horizontal numbers (e.g., 1-3, 1,2, 3) cover sections of text (possibly made up of more sections); vertical lines represent one or more core components; and the curve represents the lexicographic association (description) and the direction of the arrow points from the dependent component to the core component. If the text field is only used as an adjunct component, not as a core component, then deleting the adjunct component will still retain the relevant text. If the core component is deleted from fig. 2, the

text sections

2 and 3 are difficult to understand.

Fig. 3 depicts a further utterance tree example in accordance with an aspect. FIG. 3 includes components 361 and 362, text sections 365 through 307, associations 310, and associations 328. Association 310 depicts a relationship 310 between components 366 and 365 and 367 and 365, even if at all. Fig. 3 relates to the following text sections:

the new Tech Report areas now in the journal area of the library near the book dictionary.

Please sign your name by means of means that eat you how of being interested in finding.

Last day for sign-ups is 31 May (last day of registration is 31 months).

As can be seen, the association 328 depicts the relationship, if any, between the entities 367 and 366. Fig. 3 illustrates that while core components may be nested, there is only one most core text section.

Building a Speech Tree

The utterance tree can be generated in different ways. A simple example of a method of building DTs from bottom to top is:

(1) The spoken text is divided into units by:

(a) The cell size may vary depending on the target of the analysis

(b) Units are usually clauses

(2) Each cell and its neighbors are examined. Is there an association between units?

(3) If so, the association is marked.

(4) If not, the cell may be located on a boundary associated with a higher level. See associations between larger units (sections).

(5) And continues until all elements in the text are considered.

Mann and Thompson also describe a second level of building block structure called schema application. In RST, the lexicographic associations are not mapped directly to text; it fits to structures called modal applications, which in turn fit to the text. Schema applications stem from a simpler structure called a schema (as shown in fig. 4). Each mode indicates how a particular unit of text is broken down into other smaller units of text. The lexical structure tree or DT is a hierarchical schema application system. The modal application links multiple consecutive text sections and creates complex text sections that can in turn be linked by a higher level modal application. RST assertions, the structure of each coherent utterance can be described with a single pruned structure tree whose top pattern creates a segment that covers the entire utterance.

FIG. 4 depicts an illustrative schema in accordance with an aspect. Fig. 4 shows a connection pattern which is an item list composed of core components without accessory components. Fig. 4 depicts patterns 401 through 406. Schema 401 depicts an environmental association between text sections 410 and 428. Schema 402 depicts the sequential association between text sections 420 and 421 and the sequential association between text sections 421 and 422. Schema 403 depicts the contrasting association between text sections 430 and 431. Schema 404 depicts the connection relationship between text sections 250 and 251. Schema 405 depicts the motivational relationship between 260 and 261 and the enablement relationship between 262 and 261. The schema 406 depicts the connection relationship between the text sections 270 and 272. Fig. 4 shows an example of a connection pattern of the following three text sections:

skies will be brand name in the New York metropolian area today (the weather in the city of New York today will be locally clear).

It will be more humid with more mole hub, with temperature in the amide 80's (the weather will be more humid, with an average temperature of 80's).

Tonight with be most mortar close, with the low temperature between 65 and 70 (mostly cloudy today, low temperatures between 65 and 70).

Although fig. 2-4 depict some graphical representations of the utterance tree, other representations are possible.

Fig. 5 depicts a node link representation of a hierarchical binary tree in accordance with an aspect. As can be seen in fig. 5, the leaves of DT correspond to consecutive non-overlapping text sections called basic utterance units (EDUs). Adjacent EDUs are connected together by associations (e.g., set forth, attributed to \8230;) and form larger speech units that are also connected together by associations. The analysis of utterances in the RST involves two subtasks: utterance segmentation is the task of recognizing the EDU, and utterance parsing is the task of linking utterance units into labeled trees. And (4) carrying out utterance analysis on a document level by combining intra-sentence and multi-sentence retrieval analysis.

Fig. 5 depicts the text segments as leaves or termination nodes on the tree, each text segment numbered in the order in which it appears in the entire text, as shown in fig. 6. Fig. 5 includes a tree 500. The tree 500 includes, for example, nodes 501 to 507. The nodes indicate relationships. The nodes are either non-terminating nodes, such as node 501, or terminating nodes, such as nodes 502-507. It can be seen that

nodes

503 and 504 are associated by a connection relationship.

Nodes

502, 505, 506 and 508 are core components. The dashed lines indicate that the branches or text sections are accessory components. The association is a node in a gray box.

FIG. 6 depicts an exemplary indented text encoding of the representation in FIG. 5 according to an aspect. Fig. 6 includes text 600 and text sequences 602 through 604. Text 600 is presented in a manner that is easier to program by a computer. The text sequence 602 corresponds to node 502, the sequence 603 corresponds to node 503, and the sequence 604 corresponds to node 504. In fig. 6, "N" indicates a core component, and "S" indicates a subsidiary component.

Examples of Speech parser

Automatic speech segmentation can be performed in different ways. For example, given a sentence, the segmentation model identifies the boundaries of the composite basic speech unit by predicting whether boundaries should be inserted before each particular token in the sentence. For example, one framework considers each token in a sentence sequentially and independently. In this framework, the segmentation model scans the sentence token-by-token and uses a binary classifier (e.g., a support vector machine or logistic regression) to predict whether it is appropriate to insert a boundary before examining the token. In another example, the task is a sequential tagging problem. Once the text is segmented into basic speech units, sentence-level speech analysis can be performed to build a speech tree. Machine learning techniques may be used.

In one aspect of the invention, two kinds of pruned structure theory (RST) utterance resolvers may be used: coreNLP processors that rely on constituent syntax and FastNLP processors that use dependent syntax.

In addition, the above two utterance parsers (i.e., coreNLP processor and FastNLP processor) perform syntax parsing using Natural Language Processing (NLP). For example, stanford CoreNLP gives the basic form of a word, its part of speech, whether it is a company name, a person name, etc., whether it is a normalized date, time, and number quantity, whether to tag sentence structure according to phrases and syntactic dependencies, whether to indicate which noun phrases refer to the same entity. In fact, RST is a stationary theory that may work in many speech cases, but may not work in some cases. There are many variables including, but not limited to, what the EDU is in the relevant text, i.e., what utterance segmenter was used, what association manifest was used and what associations were selected for the EDU, the corpus of documents used for training and testing, and even what parser was used. Thus, for example, in the above-referenced "Two Practical syntactic Structure Theory Parsers" paper by Surdeanu et al, tests must be run on a particular corpus using specialized metrics to determine which parser provides better performance. Thus, unlike computer language parsers, which give predictable results, utterance parsers (and segmenters) can give unpredictable results from training and/or testing a corpus of text. Thus, the utterance tree is a mix of predictable techniques (e.g., compilers) and unpredictable techniques (e.g., as chemistry requires experimentation to determine what combination would give the desired result).

To objectively determine how well the utterance analysis is, a series of indicators are used. The precision or positive prediction is the proportion of relevant instances in the total number of relevant instances that have been taken, while the recall (also called sensitivity) is the proportion of relevant instances that have been taken. Therefore, accuracy and recall are based on an understanding and measure of relevance. Assume that a computer program for identifying dogs in a photograph identifies eight dogs in a photograph containing 12 dogs and some cats. Of the eight dogs identified, there were actually five dogs (true positives) and the remainder cats (false positives). The accuracy of the procedure was 5/8, while the recall was 5/12. When the search engine returns 36 pages, only 20 of which are relevant and fails to return another 40 relevant pages, its accuracy rate is 20/30=2/3 and its recall rate is 20/60= 1/3. Therefore, in this case, the accuracy rate is 'the degree of usefulness of the search result', and the recall rate is 'the degree of completeness of the result'. The F1 score (also referred to as F score or F value) is a measure of the accuracy of the test. It takes into account both the accuracy and recall of the test to calculate the score: f1=2 × ((precision × recall)/(precision + recall)) and is the harmonic mean of precision and recall. The F1 score reaches its best at 1 (perfect precision and recall) and the worst at 0.

Autonomous agent or chat robot

The conversation between human a and human B is a form of speech. For example, there are items such as

Messenger、

Slack、

Etc. the conversation between a and B can typically be conducted via messages in addition to the more traditional e-mail and voice conversations. A chat robot (which may also be referred to as a smart robot or virtual assistant, etc.) is a "smart" machine that, for example, replaces human B and mimics, to varying degrees, a conversation between two people. Application 102 of FIG. 1 may be an example of a chat robot. An example end goal is that human a cannot tell whether B is human or machine. Speech analysis, artificial intelligence (including machine learning) and natural language processing, achieve long-term goals through Turing testingThe improvement of the foot is achieved. Of course, as computers become more capable of searching and processing large data repositories and performing complex analyses on data, including predictive analyses, it has long been the goal of chat robots to be human-like and integrated with computers.

For example, a user may interact with the intelligent robotic platform through conversational interactions. This interaction, also known as the conversation User Interface (UI), is a conversation between the end user and the chat bot, just as between two humans. It may be as simple as the end user saying "Hello" to the chat robot, the chat robot responding to "Hi" and asking the user how it provides help, or it may be a transaction interaction in a bank chat robot (such as transferring from one account to another), or an information interaction in an HR chat robot (such as checking vacation balances), or asking for Frequently Asked Questions (FAQ) in a retail chat robot (such as how to handle returns). Natural Language Processing (NLP) and Machine Learning (ML) algorithms, in combination with other methods, can be used to classify end user intents. The high level of intent is what the end user wants to do (e.g., get account balance, make purchases). The intent is essentially a mapping of the customer input to the unit of work that the backend should perform. Thus, based on the phrases spoken by the user in the chat robot, these phrases are mapped to specific and discrete use cases or units of work (e.g., checking balances, transferring and tracking disbursements). The chat robot should support and be able to calculate which work units should be triggered by free text entries typed in natural language by the end-user.

The underlying reason for the AI chat robot to respond like a human is that the human brain can formulate and understand requests and then respond much better than a machine to human requests. Thus, if human B is emulated, the chat robot's request/response should be significantly improved. So the first part of the problem is how do the human brain formulate and understand requests? In order to mimic, a model needs to be used. RST and DT allow this operation to be performed in a formal and repeatable manner.

At a high level, there are generally two types of requests: (1) a request to perform an action; and (2) information requests, such as questions. The first type has a response that creates a unit of work. The second type has responses to questions, such as good answers. For example, in some aspects, the answers may be in the form of an AI constructing the answer from its massive knowledge base(s), or matching the best existing answer by searching the internet or intranet or other publicly/privately available data sources.

Alternating-current speech tree and modifying classifier

Aspects of the present disclosure build a communication utterance tree and use the communication utterance tree to analyze whether a thesaurus structure of a request or question is consistent with an answer. More specifically, aspects described herein create representations of utterance candidates, learn the representations, and associate potential candidates as valid or invalid. In this manner, the autonomous agent may receive a question from a user, process the question (e.g., by searching for multiple answers), determine the best answer from the answers, and provide the answers to the user. The best answer is chosen not only because it is related to the topic, but also because it is thesaurus consistent with previous utterances.

More specifically, to represent linguistic features of text, aspects described herein use lexical associations and verbal behaviors (or communicative actions). The lexical associations are relationships between parts of sentences, and are typically obtained from a speech tree. Verbal behaviors are obtained as verbs from verb resources (e.g., verbNet). By using the lexicographic associations and communication actions, aspects described herein can correctly identify valid request-response pairs. To this end, aspects associate the syntactic structure of the question with the syntactic structure of the answer. By using this structure, a better answer can be determined. Traditional utterance characterizations can be extended into conversations such that the entire utterance characterization can be classified as a valid, coherent conversation with an appropriate grooming flow versus an invalid, incoherent, logical flow-free conversation.

Analyzing request and response pairs

FIG. 7 depicts an exemplary DT of an example request for property taxes, according to one aspect. The node labels are associations and the arrowed lines point to the dependent components. The core element is the solid line. Fig. 7 depicts the following text.

Requesting: "My hugs 'grad heat sink his grade and book his grade refers to My hugging unpaid find on a license, he wa not able to get the track put in his name, I wa to put in My name and the paper property tax and the goal entity for the track, B the time to get from the track to the track off the title and the track, I di n' to have the track to the one of his own name, we do not have the track to the track, both the two to the customers, I from the one to the one of his family of names, both the two to the family of their parents, I from the one to the one of his family of parents of the track, now, because of these circumstances, I am just coming to the area of the asset and was left to return a good to a good will you be unable to afford a return? (I just want to know, since I will not be registering the truck, if there is a possibility of returning property taxes?

Responding: "The property tax is approved on property that You own" (property tax is collected on property owned) Just before You want You' ow it, so The tax is not trusted in The vehicle year, you are not trusted in The boundaries of The tax discrist, so The tax is available, even If You have not returned The vehicle to You, you still have it in The tax area, thus requiring tax payment.) not Just The all states project, you a limited amount of money, time to transfer title and path The use tax.

As can be seen from fig. 7, analyzing the above text yields the following results. "My hugs 'grant high his grant fast's truck" is described by "She-signed the title over but to My hugs", the latter being described by "living unpaid fines on his license, he wa not able to get the truck put in his name", which in turn is described by "I wan to put in My name", "and the book the property tax" and "the and the goal entity for the truck".

"My husbands 'grandis high his grad grant's guide.she signed the title over but not to My husbands having an unpaid finish on a license, he wa not to get the guide put in his name.I. wa to put in My name and paper the property tax and the goal input for the guide", is set forth as follows: "I didn't have the money" is described by "to do so", and "by the time" is described by "it cam to sending off the title" and "getting the tag". "My husbands ' growing skin his growing skin's guide.she signed the title over but not to My husband and having uncovered hairs on their license, he wa not to get the tissue in his name. I wa from his name to his name and paper property tax and not nature for the tissue, B y the time to present to off the tissue and get the title, I di n't having the person to do so" is compared with "am not to be able to like the tissue and to find the tongue" (which is described by "I insert to the tissue" and "half the hair" and "not to be able to use the tongue").

"My hu bands" mapping more than one kind of bands, hi band to My hu band and having not found in bands on a license, he we No. to get the template in a name, I we No. to get the band and get the tag, I we't the band to do so, our we do you have a tag to get the tag and get the tag, I we't the tag to get the tag and get the tag, I we's tag to get the band to come the tag, you can get the tag, I we do you have the tag to get the tag, I we do you have the tag to get the tag, I you get the tag to get the tag, I we do you have the tag to get the tag, I. "is set forth. "I am just winding" is attributed to the relationship with "is it positional to get the property tax reganded? "is the" that "of the same unit, which is conditioned on" sine I am not going to have a tag on this truck ".

As can be seen, the topic is "property tax on a car". This problem includes the following contradictions: on the one hand, all property is subject to taxes, and on the other hand, ownership is somewhat incomplete. A good response must address both topics of the problem and clarify the inconsistency. To this end, respondents have put forward a more powerful proposition that owned property must be taxed regardless of registration status. This example is a member of a positive training set from the field of answer evaluation (e.g., yahoo answers). The topic is "property tax on a car". This problem includes the following contradictions: on the one hand, all properties are subject to taxes, and on the other hand, ownership is somewhat incomplete. A good answer/response must address both topics of the question and clarify the inconsistency. The reader will observe that since the question includes contrasting lexicographic associations, the answers must match with similar lexicographic associations to be convincing. Otherwise, this answer appears incomplete, even for people who are not domain experts.

FIG. 8 depicts an exemplary response to the question presented in FIG. 7, in accordance with certain aspects of the present invention. The central core component is the property tax is associated on property described by "that you own". The "property tax is assessed on property tax you own" is also a core ingredient set forth by "Just before you chose from not t register dots not mean you don't own it, so The tax is not t available if you have not had a particular The benefit year, you still own you with The hosts of The tax discard, the tax is paper.

The core components "The property tax is accepted on The property that You own, just because You chose to not a register not a procedure that You mean You don't own it, so that The tax is not a goal convertible, even If You have a not finished The consumer, you still own with The bases of The tax discriminant, the term" heat with less dependencies on top of The normal tasks and feeds "conditioned on" If You apply "is further described by The comparison of" butyl empty derived to title with The period of time sampled state "and" You do't need register at The time to ".

Comparing the DT of fig. 7 with the DT of fig. 8, it can be determined how well the response (fig. 8) matches the request (fig. 7). In some aspects of the invention, the above framework is used, at least in part, to determine DT of a request/response and the correspondence of the fix between DTs.

In another example, the question "when do [ Organization 1] do ([ Organization 1] is doing)" has at least two answers, such as an official answer or an actual answer.

FIG. 9 illustrates a pronunciation tree of official answers according to an aspect. As depicted in FIG. 9, the official answer or task statement states "[ Organization 1] is the [ A ] format and has responsiveness for [ doing B1], [ doing B2], is responsiveness for [ doing C ] ([ tissue 1] is the [ A ] mechanism and has the responsibility of [ doing B1] [ doing B2], is responsible for [ doing C ]) ].

FIG. 10 illustrates a pronunciation tree of an original answer in accordance with an aspect. As depicted in FIG. 10, another, perhaps more honest, answer states "[ Organization 1] is submitted to [ do D ] ([ Organization 1] should [ do D ]) However, [ Organization 1] is charged with [ organic ] reactive activities A ] (However, [ Organization 1] is commanded to [ negative activity A ], [ negative activity B ], [ negative activity in [ reactive activities B ] (Not only so, they are also involved in [ more negative activity B ], [ Organization 1], [ organic ] reactive residues ] enclosing [ F ] (the result of [ occurrence of [ Organization 1], [ F ], [ results of [ occurrence of ] including [ F ], ]).

The choice of answer depends on the context. The thesaurus structure allows to distinguish between "official", template-based answers and "actual", "original", "reports from the scene" or "disputed" answers. (see fig. 9 and 10). Sometimes the question itself can give a prompt as to which kind of answer is desired. The first type of answer is appropriate if the question is expressed as a factual question or a definitional question without a second level of meaning. Otherwise, if the question has the meaning of "tell me what it is", then the second class is appropriate. In general, after extracting a thesaurus structure from a question, it is easier to select an appropriate answer with a similar, matching or complementary thesaurus structure.

The official answers are based on statements and connections that are neutral in the disputes that the text may contain (see fig. 9). Meanwhile, the original answer includes a contrast correlation. This association is extracted from the phrases that represent what the agent is expected to do and what the agent is found to do.

Classification of request-response pairs

The application 102 may determine whether a given answer or response (such as an answer obtained from the answer database 105 or a public database) responded to a given question or request. More specifically, application 102 analyzes whether a request-response pair is correct or incorrect by determining one or both of (i) a correlation or (ii) a thesaurus consistency between the request and the response. The thesaurus consistency can be analyzed without considering the correlation, which can be processed orthogonally.

Application 102 may use different methods to determine the similarity between question and answer pairs. For example, application 102 may determine a similarity between a single question and a single answer. Alternatively, the application 102 may determine a similarity measure between a first pair comprising the question and the answer and a second pair comprising the question and the answer.

For example, application 102 uses a thesaurus consensus classifier 120 trained to predict answers that match or do not match. Application 102 may process two pairs at a time, e.g., < q1, a1> and < q2, a2>. Application 102 may compare q1 to q2 and a1 to produce a combined similarity score. This comparison allows determining whether an unknown question/answer pair contains a correct answer by assessing the distance to another question/answer pair having a known label. In particular, the non-tagged pair < q2, a2> may be processed such that, rather than "guessing" the correctness based on the word or structure that q2 and a2 share, both q2 and a2 may be compared to the corresponding components q1 and a2 of the tagged pair < q2, a2> based on such word or structure. Because this method is directed to domain-independent classification of answers, only structural cohesion between questions and answers can be exploited, not the 'meaning' of the answers.

In one aspect, the application 102 trains the thesaurus consistency classifier 120 using the training data 125. In this manner, the thesaurus correspondence classifier 120 is trained to determine similarities between question and answer pairs. This is a classification problem. The training data 125 may include a positive training set and a negative training set. Training data 125 includes matching request-response pairs in the positive dataset and request-response pairs in the negative dataset of any or lesser relevance or appropriateness. For positive datasets, various fields with different acceptance criteria are selected to indicate whether an answer or response is appropriate to the question.

Each training data set includes a set of training pairs. Each training set includes a question exchange utterance tree representing questions and an answer exchange utterance tree representing answers, and an expected level of complementarity between the questions and the answers. Using an iterative process, the application 102 provides training pairs to the prune consistency classifier 120 and receives the level of complementarity from the model. Application 102 calculates a loss function by determining the difference between the determined level of complementarity and the expected level of complementarity for a particular training pair. Based on the loss function, application 102 adjusts internal parameters of the classification model to minimize the loss function.

The acceptance criteria may vary from application to application. For example, acceptance criteria for community questioning and answering, automated and manual customer support systems, social networking communications, and personal (e.g., consumers) writing (e.g., reviews and complaints) about their experiences with products may be low. RR acceptance criteria may be high in scientific text, professional news, health and legal documents in the form of frequently asked questions, professional social networks (e.g., "stackoverflow").

Alternating current phonetics tree (CDT)

Application 102 can create, analyze, and compare a communication utterance tree. The communication speech tree is designed to combine the paraphrase information with the speech behavior structure. The CDT includes an arc labeled with an expression of the AC action. By combining the alternating actions, the CDT can model the RST associations and alternating actions. CDT is the reduction of the parse tree. The parse tree refers to a combination of sentence parse trees, where the words of a sentence and the speech level relationships between parts are combined in one graph. By incorporating tags that identify verbal actions, learning of communicating verbal trees can occur on a richer set of features than the lexical associations and syntax of just the basic speech unit (EDU).

In the example, disputes between three parties [ event A ] are analyzed. A demonstrated RST representation of what is being exchanged is established. In this example, three conflicting agents, organization 1, organization 2, and organization 3, have exchanged opinions about this. This example illustrates a disputed conflict, each party going to great lengths to blame their opponents. To sound more convincing, each party does not just make its own claims, but rather formulates a response in a manner that rivals the adversary's claims. To achieve this goal, each party attempts to match the style and words of the opponent's claims.

Fig. 11 illustrates a claimed exchange utterance tree of a first agent, according to an aspect. FIG. 11 depicts a conversational speech tree 100 that represents the text "[ Organization 1] sample present points to [ Organization 3] as being responded to for [ event A ] ([ Organization 1] evidence indicates that [ Organization 3] should be responsible for [ event A ]). The report indications [ Condition A ] and identifications [ Condition B ] and pings [ Event A ] on The [ Organization 3] (The report indicates [ case A ] and identifies [ case B ] and attributes [ Event A ] to [ Organization 3 ]). "

As can be seen from fig. 11, the non-terminating nodes of the CDT are the thesaurus associations, and the terminating nodes are the basic speech units (phrases, sentence fragments) that are the subjects of these associations. Some arcs of a CDT are marked with communication action expressions, including the agent of the action (actor) and the subject of these actions (the content being communicated). For example, the core node (left side) that sets forth the association is labeled with say ([ Organization 1], evidence), and the subsidiary component is labeled with responsible (Organization 3], [ Action for Event A ]). These tags are not intended to express that the subject of the EDU is evidence and action, but rather to match this CDT with others to find similarities between them. In this case, merely linking these communication actions by the aid of the modifier association without providing information on the communication words is a very limited means for the structure representing the communication contents and the communication means. The RR is required to be too weak for the same or coordinated thesaurus associations, so the CDT labels of the arcs need to be agreed upon on the matching nodes.

Straight edges of the graph are syntactic associations, and curved arcs are speech associations, such as initial repetition, same entity, sub-entity, paraphrase association, communication action, and the like. The graph includes more information than just a parse tree combination of individual sentences. In addition to CDT, the parse tree may be generalized at the level of words, associations, phrases, and sentences. A verbal action is a logical predicate representing an agent participating in a corresponding verbal behavior and its subject. As proposed by frameworks such as VerbNet, arguments to logical predicates are formed according to the corresponding semantic roles.

FIG. 12 illustrates a claimed exchange utterance tree of a second agent in accordance with an aspect. FIG. 12 depicts a communication speech tree 1200 that represents the text "[ Organization 2] letters [ Condition C ], while [ Condition D ] ([ Organization 2] considers [ case C ], and [ case D ]). [ Organization 2] cites an innovation that is characterised [ condition E ] ([ Organization 2] cites a survey that identifies [ condition E ]). "

FIG. 13 illustrates a claimed exchange utterance tree of a third agent in accordance with an aspect. FIG. 13 depicts a communication speech tree 1300 that represents the text "[ Organization 3] threads that [ Condition B ] while [ Condition A ] ([ case B of case A ] is rejected by [ Organization 3 ]). It is not until Time A to say Condition B (at Time A, it is possible to determine Condition B). "

As can be seen from the communication utterance trees 1100-1300, the response is not arbitrary. The response talks about the same entity as the original text. For example, the communication utterance trees 1200 and 1300 are related to the communication utterance tree 1100. The response support is inconsistent with the estimates and emotions about these entities and about the actions of these entities.

More specifically, the reply of the agent involved needs to reflect the communication utterance of the first sub-message. As a simple observation, because the first agent uses attribution to convey his claims, the other agents follow the set of schemes, either providing their own attribution, or the validity of the attribution of the attack proposer, or both. To capture various features to understand how the communication structure of the seed message needs to be preserved in successive messages, pairs of corresponding CDTs can be learned.

To verify request-response consistency, mere speech correlation or verbal action (communication action) is often insufficient. As can be seen from the examples depicted in fig. 11-13, the verbal structure of the interaction and the kind of interaction between the agents are useful. However, there is no need to analyze the domain of interactions or the subjects, i.e. entities, of these interactions.

Representing retrieval associations and communication actions

To compute the similarity between abstract structures, two methods are often used: (1) Representing these structures in numerical space and representing similarity as numbers, which is a statistical learning method, or (2) using structural representations instead of numerical spaces such as trees and graphs, and representing similarity as the largest common substructure. Expressing similarity as the largest common substructure is called generalization.

Learning the communication action helps to express and understand the demonstration. The compute verb dictionary helps to support the capture of action entities and provides a rule-based form to express their meaning. The verbs express the semantics of the described event and the associated information between the event participants and project a syntactic structure that encodes the information. Verbs, particularly exchange action verbs, can be highly variable and can exhibit rich semantic behavior. In response, verb taxonomy helps the learning system deal with this complexity by organizing verbs into groups that share core semantic attributes.

VerbNet is a dictionary that recognizes semantic and syntactic pattern properties of verbs in each class and identifies the connections between syntactic patterns and underlying semantic associations that can be inferred for all members of the class. Each syntactic framework or verb signature of a class has a corresponding semantic representation that specifies the semantic associations between event participants in the event process.

For example, the verbs, amuse, are part of a set of similar verbs, such as amaze (startle), anger (anger), arouse (disturb), disturb (disturb), and irarate (anger), with similar argumentation structures (semantic roles). The demonstrated roles of these communication actions are as follows: the experiencer (typically an animated entity), the stimulus, and the outcome. Each verb can have a different category of meaning that is distinguished by the syntactic characteristics of how the verb appears in the sentence or frame. For example, using the following key Noun Phrases (NP), noun (N), communication action (V), verb Phrases (VP), adverb (ADV), the framework of amuse is as follows:

NP V NP. Example (a): "The teacher used The children (The teacher has amused The children). "syntax: and stimulating the V experience. The following clauses: amuse (Stimulus, E, emotion, experience), house (Stimulus, E), emotoinal _ state (result (E), emotion, experience).

NP V ADV-Middle. Example (c): "Small child mouse quick. "syntax: experienced person V ADV. Clauses: amuse (Experience, prop) — property (Experience, prop), adv (Prop).

NP V NP-PRO-ARB. Example "The teacher used (teacher was amused)". Syntax: and (4) stimulating V. Mouse (Stimus, E), empirical _ state (result (E), emotion, expert).

Cause V NP. An example is "The teacher's dolls used The child (The teacher's doll amuses children". Syntax: stimulus < + all lattice > ('s) V experience. Amuse (Stimulus, E, emotion, experincer): case (Stimulus, E), emootion _ state (reducing (E), emotion, experincer).

NP V NP ADJ. An example "This performance bound me totaily (This show lets me feel very bored). "syntax: stimulus V experienced results. amuse (Stimus, E, emotion, experience). Cause (Stimus, E), emotationjstate (result (E), emotion, experience), pred (result (E), experience).

The communicative actions may be characterized as a group, such as: verbs with predicate complements (appointments, tokens, dubbing, declarations, guesses, disguises, orphaned, directed, considered, classified), perceive verbs (see, aim, end details). Mental state verbs (tease, cheer, surprise, solicit), urge verbs (wanted, craving). Judgment of verbs (judgment), assessment of verbs (evaluation, estimation), search of verbs (hunting, search, tracking, reconnaissance, finding, search), social interaction verbs (match, marriage, meet, fight), communication verbs (transfer (message), question, query, tell, mode (talk), talk, chat, say, complaint, advice, acknowledge, speech, exaggeration, commitment). Verb avoidance (avoidance), measure verb (registration, cost, fit, pricing, billing), body verb (start, complete, continue, stop, build, maintain).

Aspects described herein provide advantages over statistical learning models. In contrast to statistical solutions, aspects may provide verb or verb-like structures that are determined to result in a target feature (e.g., a thesaurus). For example, statistical machine learning models represent similarity as numbers, which can make interpretation difficult.

Representing request-response pairs

Representing request-response pairs facilitates class-based operations based on pairs. In an example, the request-response pair may be represented as a parse tree. A parse tree is a representation of a parse tree of two or more sentences in which the relationships of the words of the sentences to the utterance level between the parts are represented in one graph. Topic similarity between questions and answers may be represented as a common subgraph of the parse tree. The greater the number of common graph nodes, the higher the similarity.

FIG. 14 illustrates a parse tree in accordance with an aspect. FIG. 14 depicts a parse tree bundle 1400 including a parse tree for a request 1401 and a parse tree for a corresponding response 1402.

Parse tree 1401 indicates the question "I just had a baby and it hooks more like a husband I had a baby with I just having a child that looks like a husband that has a child with I) how much of a dots lot loop all and I am scanned at he searching on a me with an I had a her kit (However, he does not look like I at all, I fear that he is stealing his and another mother, but I has a real child that is This child that is the best thing that is why This child has an even more of a pen to me and I not that my bag loop.

Response 1402 shows a response "clinical laboratory on following with a child who had left the marriage therapist.A One option for the husband and to the available contact but just the basic legal and financial commitments were avoided by the husband: (Another option is to have the child completely engaged in the family, just like the previous in-marriage child)"

FIG. 14 illustrates a greedy approach to representing linguistic information about a piece of text. Straight edges of the graph are syntactic associations, and curved arcs are phonetic associations, such as first language repetition, same entity, sub-entity, correction association, communication action, and the like. Solid arcs represent identical entity/sub-entity/first language repeated associations, and dashed arcs represent fix-up associations and communication actions. Oval labels with straight edges represent syntactic associations. The lemma is written within the box of the node and the lemma form is written to the right of the node.

Parse tree bundle 1400 includes richer information than just parse tree combinations of individual sentences. Navigating through the graph along the edges of syntactic associations and arcs of utterance associations can transform a given parse plexus into a semantically equivalent form to match other parse plexuses, thereby performing text similarity assessment tasks. To form a complete formal representation of a paragraph, it is desirable to express as many links as possible. Each utterance arc produces a pair of potentially matching brushphrase.

The topic similarity between the seed (request) and the response is represented as a common subgraph of the parse tree. They are visualized as a connected cloud. The greater the number of common graph nodes, the higher the similarity. For thesaurus consistency, the common subgraphs do not have to be as large as in a given text. However, the lexicographic associations and communication actions of seeds and responses are related and need to correspond.

Generalization of AC operation

Two alternating actions A ₁ And A ₂ The similarity between them is defined as having A ₁ And A ₂ Abstract verbs of features in common between. Defining the similarity of two verbs as abstract verb-like structures supports inductive learning tasks, such as thesaurus consistency assessment. In an example, the similarity between the following two common verbs, the agree and disagrede, can be generalized as follows: agree ^ disagrege = verb (Interlogic, pro)position _ action, speaker), where intersection is the person who proposes position _ action to Speaker and Speaker communicates its response to Speaker. The popped _ action is the action that Speaker will perform when accepting or rejecting a request or offer, and Speaker is the person to whom a particular action has been made and who responds to the request or offer made.

In another example, the similarity between the verbs agree and explain is expressed as follows: the subject of the ac action is generalized in the context of the ac action, rather than in other "physical" actions. Therefore, the respective occurrences of the communication action are generalized in various aspects together with the corresponding subjects.

Additionally, the sequence of communication actions representing a conversation may be compared to other such similar conversation sequences. In this way, the meaning of a single communication action is reflected as well as the dynamic speech structure of the dialog (compared to its static structure reflected via the lexical associations). Generalization is the representation of a composite structure that occurs at each level. The lemmas of the communicative action are generalized in terms of lemmas, and their semantic roles are generalized in corresponding semantic roles.

Text authors use communication actions to indicate the structure of a conversation or conflict. The subject is generalized in the context of these actions, rather than in other "physical" actions. Thus, the individual occurrences of the communicative action together and their pairs generalize with their bodies as the spoken "step".

Generalization of the communication action can also be considered from the perspective of matching verb frameworks (e.g., verbNet). The communication link reflects the structure of the utterance associated with the participation (or mention) of more than one single subject in the text. These links form a sequence of words (verbs or multiwords that implicitly indicate the person's intent to communicate) that connect for communication actions.

The communicative action includes an action, one or more agents that are performed the action, and a phrase describing the characteristics of the action. The communication action may be described according to the following form: verbs (agent, subject, reason), where verbs characterize some type of interaction (e.g., explain, confirm, remind, disagree, deny, etc.) between the agents involved, subjects refer to the subject of the information or description delivered, and reasons refer to the motivation or explanation of the subject.

The scenario (marked with a directed graph) is a parse tree G = (V, subgraph of A), where V = { action ₁ ，action ₂ ...action _n Is a finite set of vertices corresponding to the communication action (action means "action"), and a is a finite set of labeled arcs (ordered pairs of vertices), sorted as follows:

action of each arc _i ，action _j ∈A _sequence Corresponding to references to the same subject (e.g. s) _j ＝s _i ) Or two actions v of different subjects _i ，ag _i ，s _i ，c _i And v _j ，ag _j ，s _j ，c _j Time priority of (3). Action of each arc _i ，action _j ∈A _cause Corresponding to action _i And action _j An attack relationship therebetween, the attack relationship indicating an action _i Reason and action of _j Conflict with the subject or reason of (c).

The parse tree subgraph associated with the interactive scenario between agents has some salient features. For example, (1) all vertices are ordered in time such that all vertices (except the initial and final vertices) have one in-arc and one out-arc, (2) for A _sequence Arcs (sequence means "order"), allowing at most only one arc in and only one arc out, and (3) for A _cause Arcs (cause means "cause"), there may be many outgoing arcs from a given vertex, and many incoming arcs. The vertices involved may be associated with different agents or with the same agent (i.e., when he contradicts themselves). In order to compute the similarity between the analytic plexus and its alternating actions, strict correspondences of induced subgraphs, identically configured subgraphs with similar arc labels, and vertices are analyzed.

By analyzing the alternating action arc of the analytic tree, the following similarities exist: (1) One communication action whose subject comes from T1 with another communication action whose subject comes from T2 (without using the communication action arc), and (2) one pair of communication actions whose subject comes from T1 with another pair of communication actions from T2 (using the communication action arc).

Generalizing two different communication actions is based on their properties. As can be seen from the example discussed with respect to fig. 14, one communication action of T1, tuning (husband, wife, antenna lady), may be compared to a second communication action of T2, avoid (husband, contact). Generalization results in a communicative _ action (husband), which introduces a constraint on a in the form: if a given agent (= husband) is mentioned in Q as the subject of CA, then he(s) should also be the subject of (possibly another) CA in a. Two communication actions can always be generalized, but their subjects are not: if their generalization results are null, then the generalization results for the communicative actions with these subjects are also null.

Generalization of RST associations

Some of the associations between the utterance trees can be generalized, such as arcs representing the same type of association (representing associations (e.g., against associations), topic associations (e.g., conditions), and multi-core associations (e.g., lists)) can be generalized. The core component or the situation in which the core component is present is denoted by "N". The accessory ingredient or the condition in which the accessory ingredient is present is denoted by "S". "W" denotes the author. "R" represents a reader (listener). Situations are propositions, actions completed or ongoing, and communication actions and states (including beliefs, desires, assertions, interpretations, and interpretations, etc.). The generalization of the two RST associations with the above parameters is represented as:

rst1(N1，S1，W1，R1)rst2(N2，S2，W2，R2)＝ (rst1^rst2)(N1^N2，S1^S2，W1^W2，R1^R2)。

the text in N1, S1, W1, R1 is generalized into phrases. For example, rst1^ rst2 can be generalized as follows: (1) if the relationship _ type (rst 1)! And = relation _ type (rst 2), the generalization is null. (2) otherwise, generalizing the associated signature into a sentence: the sentence (N1, S1, W1, R1) ^ sentence (N2, S2, W2, R2).

For example, rst-background ^ rst-enable = (ability of S to increase R to understand an element IN N) ^ (ability of S to increase R to understand an element IN N) = increment-VB the-DT availability-NN of-IN R-NN to-IN.

The RST association portion is empty due to the association RST-background ^ RST-enable difference. The expressions defined as verbs associated with the respective RSTs are then generalized. For example, for each word or placeholder for a word such as an agent, if the word in each input phrase is the same, the word (and its POS) is retained, and if the word is different between the phrases, the word is deleted. The expression thus generated can be interpreted as a common meaning between the definitions of two different RST associations formally obtained.

The two arcs between the question and the answer depicted in fig. 14 show generalized examples based on the RST association "RST-compare". For example, "I just had a baby" is RST-contrast with "it dots not look like me" and is related to "husband to avoid contact", which is RST-contrast with "have the basic left and finish references". It can be seen that the answers need not be similar to the verb phrases of the questions, but the lexical structure of the questions and answers are similar. Not all phrases in the answer must match the phrases in the question. For example, unmatched phrases have some lexical association with phrases in the answers that are related to phrases in the question.

Building exchange speech tree

FIG. 15 illustrates an exemplary process for building a communication utterance tree according to an aspect. Application 102 can implement process 1500 (e.g., utilizing dialog utterance tree generator 104). As discussed, communicating the utterance tree can improve the results of a search engine.

At block 1501, the process 1500 involves accessing a sentence that includes a segment. At least one segment includes a verb and a word, and each word includes a role of the word within the segment, and each segment is a basic speech unit. For example, the application 102 accesses a sentence such as "[ Organization 3] threads [ condition B ] while [ condition A ]" described with respect to FIG. 13.

Continuing the example, application 102 determines that the sentence includes several segments. For example, the first segment is "[ Organization 3] threads". The second segment is "[ Condition B ]". The third segment is "which [ Condition A ]". Each segment includes a verb, e.g., "deny" of the first segment and "[ Action in Condition B ] of the second segment. However, a fragment need not include a verb.

At block 1502, the process 1500 involves generating (e.g., by the application 102) a speech tree that represents the lexicographic relationships between sentence fragments. The utterance tree includes nodes, each non-terminating node representing a lexical relationship between two sentence fragments, and each of the terminating nodes of the utterance tree is associated with one of the sentence fragments.

Continuing the example, application 102 generates a speech tree as shown in FIG. 13. For example, the third segment, "while [ Condition A ]" illustrates "[ Condition B ]". The second and third segments together are related to the attribution of the occurred event, i.e., the object cannot be [ tissue 3] because [ case B ] did not occur.

At block 1503, the process 1500 involves accessing (e.g., by the application 102) a plurality of verb signatures. For example, application 102 accesses a verb list (e.g., from VerbNet). Each verb matches or is related to a verb of a segment. For example, for the first segment, the verb is "deny". Thus, application 102 accesses the verb signature list associated with the verb deny.

As discussed, each verb signature includes a verb of a segment and one or more subject roles. For example, the signature includes one or more of a Noun Phrase (NP), a noun (N), a communication action (V), a Verb Phrase (VP), or an Adverb (ADV). The topic role describes the relationship between verbs and related words. For example, "the teachers used the child commandeered" and "small children quickly commandeered" have different signatures. For the first fragment, the verb "deny," application 102 accesses a frame list or verb signature of the verb that matches "deny. The lists are "NP V NP to be NP", "NP V that S", and "NP V NP".

Each verb signature includes a topic role. The topic role refers to the role of the verb in the sentence fragment. Application 102 determines the subject role in each verb signature. Example topic meta-roles include an actor, agent, asset, attribute, beneficiary, cause, location destination source, destination, source, location, experiencer, scope, tool, material and product, material, product, victim, predicate, recipient, stimulus, substance, time, or topic.

At block 1504, the process 1500 involves determining, for each of the verb signatures, a plurality of thematic roles for the respective signature that match the role of the word in the segment. For the first segment, application 102 determines that the verb "deny" has only three roles, "agent," verb, "and" the me.

At block 1505, the process 1500 involves selecting (e.g., by the application 102) a particular verb signature from the verb signatures based on the particular verb signature having the highest number of matches. For example, referring again to FIG. 13, the dent in the first fragment "[ Organization 3] threads.. That [ Condition B ]" matches the verb signature dent "NP V NP", and "[ Action in Condition B ]" matches [ Action in Condition B ] ([ Organization 3], [ Object in Condition B ]). The verb signatures are nested, resulting in a nested signature "dent ([ Organization 3], [ Action in Condition B ] ([ Organization 3], [ Object in Condition B ])".

Representing request-response

Request-response pairs may be analyzed individually or in pairs. In an example, request-response pairs may be linked together. In a chain, expected thesaurus correspondence holds not only between successive members, but also between triples and quadruplets. A utterance tree can be constructed for text that expresses a series of request-response pairs. For example, in the field of customer complaints, requests and responses exist in the same text from the point of view of the complaint taker. The customer complaint text can be split into a request text portion and a response text portion, and then paired positive and negative data sets are formed. In an example, all text of the supporter and all text of the opponent are combined. The first sentence of each paragraph below will form the request portion (which will include three sentences) and the second sentence of each paragraph will form the response portion (which in this example will also include three sentences).

FIG. 16 illustrates a utterance tree and a scene graph in accordance with an aspect. Fig. 16 depicts a speech tree 1601 and a scene graph 1602. The utterance tree 1601 corresponds to the following three sentences:

( 1) I extended which my check bound (I white after) I master a location (I interpreted that I explained my check was returned (I was written after deposit). ) A customer service representative rendered available at a service ticket source time to a process the location (acceptance by a customer service representative typically takes some time to process the deposit. )

( 2) I recycled that I was not free of charge an over draft fe a month ago in a similar situation. ) The same modified that it was a wave of an instant wave of delayed separated in my account information (They deny this to be unfair because overdraft fees have been disclosed in my account information. )

( 3) I disconnected with the third fee and the fourth fee disposed back to my account (I did not agree to their charge and wish to save this charge back to my account). ) The extended can at this point and that I need to be at the lock of the account rules. )

As can be seen from the utterance tree in fig. 16, it is difficult to determine whether the text represents an interaction or a description. Thus, by analyzing the alternating action arcs of the analytic trees, implicit similarities between the texts can be discovered. For example, in general terms:

(1) One communication action whose subject comes from the first tree and another communication action whose subject comes from the second tree (without using communication action arcs).

(2) The subject is one pair of communication actions from the first tree and another pair of communication actions from the second tree (using communication action arcs).

For example, in the previous example, generalization of the formulation ^ avoid (husband, contact) provides us with a communicative _ action (husband, owner) that introduces a constraint on A in the following form: if a given agent (= husband) is mentioned in Q as the subject of CA, then he(s) should also be the subject of (possibly another) CA in a.

To handle the meaning of words that express the subject of CA, the words may be applied to a vector model, such as the "word2vector" model. More specifically, to compute the generalization between the subjects of the communication action, the following rules may be used: if subject1= subject2 (subject 1= subject 2), subject1^ subject2= < subject1, POS (subject 1), 1> (subject 1^ subject2= < subject1, POS (subject 1,1 >). Here, the subject is still present and the score is 1. Otherwise, if the parts of speech (POS) of the subjects are the same, then subject1^ subject2= < + >, POS (subject 1), and word2vecDistance (subject 1^ subject 2) > (subject 1^ subject2= < >, POS (subject 1), word2vecDistance (subject 1^ subject 2) >). ' indicates that the token is a placeholder and the score is the word2vec distance between the words. If the POS are different, the generalization is a null tuple and may not be further generalized.

Classification setting of request-response pairs

In conventional searches, as a baseline, the match between request-response pairs can be measured by keyword statistics, such as short term word frequency-inverse document frequency (TF IDF). To improve search relevance, the score is augmented by item popularity, item location, or taxonomy-based score. The search may also be expressed as a paragraph reordering problem in a machine learning framework. The feature space includes request-response pairs as elements, and the separating hyperplane separates the feature space into correct pairs and incorrect pairs. Thus, the search problem may be formulated in a local manner as a similarity between the request and response, or in a global learning manner via the similarity between request-response pairs.

Other methods may be used to determine a match between the request and the response. In a first example, application 102 extracts the features of the request and response and compares the features as counts, introducing a scoring function such that the score will indicate a classification (low scoring for wrong pairs, high scoring for correct pairs).

In a second example, application 102 compares representations of requests and responses to each other and assigns a score to the comparison. Similarly, the score will indicate the classification.

In a third example, application 102 establishes a representation of request and response pairs < Req, resp > as an element of the training set. Application 102 then performs learning in the feature space of all such elements < Req, resp >.

FIG. 17 illustrates forming request-response pairs in accordance with an aspect. FIG. 17 depicts a request-response pair 1701, a request tree (or object) 1702, and a response tree 1703. To form the < Req, resp > object, application 102 combines the request utterance tree and the response utterance tree into a single tree having root RR. Application 102 then classifies the object into a correct category (with high consistency) and an incorrect category (with low consistency).

Nearest neighbor graph based classification

Once the CDT is established, to recognize the argument in the text, application 102 calculates the similarity as compared to the CDT of the positive class and verifies that it is lower than the set of CDTs of its negative class. The similarity between CDTs is defined by the largest common child CDT.

In an example, an ordered set G of CDT (V, E) is constructed with the from-set

And vertex and edge labels of (Λ E, ≦). The tagged CDT Γ in G is a pair of the form ((V, l), (E, b)), where V is a set of vertices, E is a set of edges,

is a function that assigns labels to vertices, and b: e → Λ _E Is a function of assigning labels to edges. Isomorphic trees with the same labels do not distinguish.

The order is defined as follows: for two CDT Γ in G ₁ ：＝((V ₁ ，l ₁ )，(E ₁ ，b ₁ ) R and F) ₂ ：＝((V ₂ ，l ₂ )，(E ₂ ，b ₂ ) If there is a one-to-one mapping)

Such that it (1) complies with the edge:

and (2) meet under the following labels:

then gamma is ₁ Leading gamma ₂ Or gamma ₂ ≤Γ ₁ (or gamma) ₂ Is gamma ₁ sub-CDT of).

This definition takes into account when starting from a "larger" CDT G ₁ To "smaller" CDT G ₂ Similarity calculation ("weakening") of labels of matching vertices at time.

Now, the similarity CDT Z (denoted X ^ Y = Z) of a pair of CDTs X and Y is the set of all X and Y containing the largest common child CDTs, each of which satisfies the following additional condition (1) for matching, the two vertices from CDT X and Y must represent the same RST association; and (2) each common child CDT in Z contains at least one communication action with the same verbnt signature as X and Y.

This definition can be easily extended to find generalization of several graphs. The panning order μ for the atlas X and Y pairs is naturally defined as X μ Y: = X × Y = X.

FIG. 18 illustrates a maximum common sub-exchange utterance tree in accordance with an aspect. Note that the tree is inverted and the labels of the arcs are generalized: the communication action site () is generalized with the communication action say (). The first (proxy) demonstration of the former CA "[ organization 2]" is generalized with the first demonstration of the latter CA "[ organization 1]". The same operation applies to the second demonstration of CA: [ Peer 1] < Lambda > evidence ([ human 1] < Lambda > evidence) ].

CDTU belongs to the positive class, such that (1) U is similar to (has a non-empty common child CDT) positive case R ⁺ And (2) for any negative case R ^- If U is similar to R- (i.e.,

) Then U x R ^- μU*R ⁺ 。

This condition introduces a similarity measure and indicates that to be assigned to a class, the similarity between the unknown CDTU and the CDT closest to the positive class should be higher than the similarity between U and each negative case. Condition 2 means that the positive example R exists ⁺ So that there is no negative example R ^- Having U R ⁺ μR ^- That is, there is no counter-example for this generalization of the positive example.

Cluster kernel learning of CDT

Today, tree-core learning for strings, parse trees, and parse stands is a mature field of research. The parse tree core counts the number of common sub-trees as a measure of utterance similarity between the two instances. By augmenting the DT kernel with information about the communication action, a clump kernel is defined for the CDT.

The CDT can be represented by an integer count vector V for each subtree type (regardless of its ancestors): v (T) = (number of subtrees of type 1.,. Number of subtrees of type I.,. Number of subtrees of type n). This results in a very high dimension, since the size of the number of different subtrees is exponential. Thus, the feature vectors are used directly

Is computationally infeasible. To solve the computational problem, a tree kernel is introduced to efficiently compute the dot product between the above-mentioned high-dimensional vectors. Given two tree segments, CDT1 and CDT2, the tree kernel function is defined as:

K(CDT1，CDT2)＝＜V(CDT1)，V(CDT2)＞＝∑i V(CDT1)[i]，V(CDT2)[i]＝∑n1∑n2∑i Ii(n1)*Ii(n2)

wherein N1 belongs to N1, N2 belongs to N2, wherein N1 and N2 are respectively the set of all nodes in CDT1 and CDT 2; ii (n) is an indicator function. Ii (n) = {1, if a subtree of type i has a root at a node; 0, other cases }. K (CDT 1, CDT 2) is an example of a convolution kernel on a tree structure (Collins and Duffy, 2002) and can be calculated by a recursive definition as:

Δ(n1，n2)＝∑I Ii(n1)*Ii(n2)

Δ (n 1, n 2) =0 if n1 and n2 are assigned the same POS tag or their children are different subtrees.

Otherwise, if n1 and n2 are both POS tags (are front termination nodes), Δ (n 1, n 2) =1x λ;

otherwise, Δ (n 1, n 2) = λ Π _j＝1 ^nc(n1) (1+Δ(ch(n1，j)，ch(n2，j)))

Where ch (n, j) is the jth child of node n, nc (n) ₁ ) Is n ₁ And λ (0)<λ<1) Is a decay factor to make the kernel values less variable in the size of the subtrees. In addition, the third recursion rule holds because given two nodes with the same children, a common subtree can be constructed using these children and the common subtree of the further descendants. The parse tree core counts the number of common sub-trees as a syntactic similarity measure between the two instances.

Items of alternating action as tags are converted into trees, which are added to the respective nodes associated with the RST. For EDU text as a termination node label, only the phrase structure is retained. The termination nodes are labeled with sequences of phrase types rather than parse tree fragments.

If there is a fix-up association arc from node X to the terminating EDU node Y with label A (B, C (D)), then sub-tree A-B- > (C-D) is appended to X.

Implementation of a correspondence classifier

The thesaurus consistency classifier 120 can determine the complementarity between two sentences (e.g., question and answer) by using a communication utterance tree. FIG. 20 illustrates an exemplary process for implementing a pruned consistency classifier in accordance with an aspect. Fig. 20 depicts a process 2000 that may be implemented by application 102. As discussed, the thesaurus consistency classifier 120 is trained with training data 125.

The thesaurus correspondence classifier 120 may determine a communication utterance tree for both questions and answers. For example, the pruned consistency classifier 120 constructs a question exchange utterance tree from the question and an answer exchange utterance tree from the candidate answers.

At block 2001, the process 2000 involves determining a question exchange utterance tree including a question root node for the question sentence. The question sentence may be an explicit question, request, or comment. Application 102 creates a question exchange utterance tree from the question. Using the example discussed with respect to FIGS. 13 and 15, the example question sentence is "are [ Organization 3] response for [ Event A ]". Application 102 may use process 1500 described with respect to fig. 15. The example problem has a root node "set forth".

At block 2002, the process 2000 involves determining a second communication utterance tree for the answer sentence, wherein the answer communication utterance tree includes an answer root node. Continuing with the above example, application 102 creates a communication utterance tree, which also has a root node "set forth" as depicted in FIG. 13.

At block 2003, the process 2000 involves associating the communication utterance tree by recognizing that the question root node and the answer root node are the same. Application 102 determines that the question exchange utterance tree and the answer exchange utterance tree have the same root node. The resulting associated exchange utterance tree is depicted in fig. 17 and may be labeled as a "request-response pair.

At block 2004, the process 2000 involves calculating a level of complementarity between the question alternating utterance tree and the answer alternating utterance tree by applying a predictive model to the merged utterance tree.

The pruned consistency classifier uses machine learning techniques. In one aspect, the application 102 trains and uses the thesaurus consistency classifier 120. For example, application 102 defines a positive class and a negative class for request-response pairs. The positive class includes the correct request-response pairs in the prune, and the negative class includes the related but not related request-response pairs in the prune.

For each request-response pair, application 102 builds a CDT by parsing each sentence and obtaining verb signatures for the sentence fragments.

Application 102 provides the associated pairs of communication utterance trees to the thesaurus correspondence classifier 120. The prune consistency classifier 120 outputs a level of complementarity.

At block 2005, the process 2000 involves identifying the question sentence and the answer sentence as being complementary in response to determining that the level of complementarity is above a threshold. Application 102 may use a threshold of complementarity level to determine whether the question-answer pair is sufficiently complementary. For example, if the classification score is greater than a threshold, application 102 may output an answer. Alternatively, application 102 may discard the answer and access answer database 105 or another public database of candidate answers and repeat process 2000 as needed.

In an aspect, the application 102 obtains a common reference. In another aspect, application 102 obtains entities and sub-entities, or hyponym links. A hyponym is a word that has a more specific meaning than a general or hypernym that applies to the word. For example, "spoon (zoom)" is a lower word of "tableware (cutlery)".

In another aspect, application 102 applies rosette kernel learning to the representation. The rosette kernel learning may occur instead of the classification-based learning described above, such as at block 2004. Application 102 builds parse tree pairs for the parse trees for the request-response pairs. Application 102 applies utterance parsing to obtain utterance tree pairs for request-response pairs. Application 102 aligns the basic phonetic units of the phonetic tree request-response and the parse tree request-response. Application 102 combines the basic phonetic units of the utterance tree request-response and the parse tree request-response.

In one aspect, application 102 improves text similarity assessment via the word2vector model.

In another aspect, the application 102 sends a sentence corresponding to the question exchange utterance tree or a sentence corresponding to the answer exchange utterance tree to a device, such as the computing device 103. Output from application 102 may be used as input to a search query, database lookup, or other system. In this manner, application 102 may be integrated with a search engine system.

Additional rules for RR-congruence and RR-irrational

The following is an example of a structural rule that introduces constraints to enforce RR conformance: 1) Both requests and responses have the same emotional polarity (e.g., if a request is positive, then the response should also be positive, and vice versa). 2) Both requests and responses are logically demonstrated.

With rational reasoning, the requests and responses will be identical: a rational agent will provide answers that are both relevant and match the question thesaurus. However, in the real world, not all responses are fully rational. Research subjects with Cognitive bias (Cognitive bias) explored the tendency of humans to think in some way, which might lead to systematically deviating from rational or well-judged criteria.

The correspondence bias (corresponding bias) is that people tend to over-emphasize personality-based interpretations to others' observed behavior when responding to questions. At the same time, those responding to the question underestimate the effect and strength of situational influences on the same behavior.

Confirmation bias (Confirmation bias) is the tendency to search for or interpret information in a way that confirms the antecedent of those who answer questions. They may suspect information that does not support their opinion. Confirmation bias is related to the concept of cognitive dysfunction. Thus, individuals can reduce inconsistencies by searching for information that reconfirms their point of view.

Anchoring may result in over-reliance on, or "anchoring" a trait or piece of information in making a decision.

The availability heuristics allow us to overestimate the likelihood of events with greater "availability" in memory, which may be influenced by the recency of memory or how unusual or emotional they may be.

According to the crowd effect, people believe the situation when answering questions because many others do (or believe) the same thing.

Belief bias (Belief bias) is an effect that someone's assessment of demonstrated logical strength is biased by the confidence of the conclusion.

Bias blind spots are a tendency to think that one's own bias is less than others, or to be able to identify more cognitive biases than one's own on others.

Building a communication utterance tree

Fig. 21 illustrates an exemplary process 2100 for dialog management using a dialog utterance tree (DDT), in accordance with at least one embodiment. Process 2100 can include generating a utterance characterization (e.g., a dialog utterance tree (DDT)) for a dialog. DDTs may be used to classify dialogs as valid (e.g., consecutive dialogs with appropriate pruning flows) or invalid (e.g., incoherent dialogs with illogical flows).

One key step in response selection is to measure the degree of match between the input (e.g., text 128) and the response candidate (of which answer 132 of fig. 1 is an example). Unlike single-pass sessions, where the input is a single utterance (i.e., message), multi-pass sessions require context-response matching, where both the utterance in the current message and the previous pass should be taken into account. Challenges to this task include:

1) How to extract information (words, phrases, and sentences) from the context and use the information for matching;

2) How the relationships and dependencies between words are modeled in context.

The text 128 of fig. 1 may be used as an example to illustrate these challenges. To find the appropriate response to the context (e.g., answer 132), the chat robot must know that 'go swimming class' and 'swim' are important. Without them, it may return a response that is relevant to the message but is not appropriate in context (e.g., "what course you want. On the other hand, mention is made of Bragg and Wahrard

May have little use in reaction selection and may even produce noise. If the chat robot is very aware of these words, the chat robot's response may be transferred to the canvasOn the topic of the lattice. It is therefore crucial, rather than trivial, for a dialog manager (e.g., application 102 of FIG. 1) to understand the emphasis in context and utilize them in matching, while at the same time avoiding noise.

The number of candidate responses may include: 1) What class you want? 2) Do you want me to teach you swimming? 3) Do you want to do something else besides swimming? And 4) did you ever swim with butterfly? The first three answers are correct in subject matter, but disrupt the logical flow of the conversation. The fourth answer is related to the topic and does not disrupt the logical flow of the conversation. Thus, the chat bot should select the fourth response as answer 132 and provide that response next in conversation 130. Process 2100 provides for selecting an appropriate response (e.g., response 4 above) from a plurality of candidate responses (e.g., responses 1-4 above).

Process 2100 may begin at 2102, where a dialog may be obtained. For example, the text 128 of FIG. 1, an example dialog, may be obtained. The text 128 can include a multi-echo conversation between the chat robot and the user in which multiple utterances (e.g., questions or answers) have been provided. In some embodiments, the text 128 may be obtained by merging the individual utterances of the dialog 130.

At 2104, a training set of dialog problem domains can be established from documents mined from the web.

Constructing conversational training data from text

Dialog systems continue to encounter bottlenecks in acquiring training data. In most problem domains, designers of chat robots are unable to obtain the required quality and quantity of training session data sets, and therefore attempt to find alternative, lower quality data sets and techniques, such as transfer learning. Therefore, the coherence of the correlation and the dialogue tends to be unsatisfactory.

Text passages of different styles and genres may be converted to conversational form and used as examples in a training data set.

FIG. 22 illustrates an exemplary process 2200 for generating a conversation instance in accordance with at least one embodiment. In process 2200, a paragraph of text (e.g., a document) canIs split into text fragments to be used as a set of answers [ A ] ₁ ,A ₂ ,...,A _N ]And questions may be automatically formed from some of these text segments. From the text T (e.g., [ A ] ₁ ,A ₂ ,...,A _N ]) The question of establishing a dialog is expressed as breaking it into a series of answers a = [ a = ₁ …A _n ]To form a dialog A ₁ ,<Q ₁ ,A ₂ >,…,<Q _n-1 ,A _n >]Wherein A is _i Answer A _i-1 And possibly answers to previous questions, and $ A _i ＝T。Q _i-1 Need to go from A through linguistic means and generalization _i To make some adjustments may help make these problems sound more natural. To accomplish this, semantically similar phrases can be found on the web and merged with candidate questions.

Problem generation has gained increasing interest branching off from the general Q/a problem. The task is to generate Natural Language (NL) questions conditioned on answers and corresponding documents. Problem generation has been used to improve Q/a systems in its many applications.

The dialog may be formed from text according to the following rules: after the core EDU is completed, a question for the attached EDU may be inserted before the attached EDU begins. In terms of the dialog flow between the text author and the questioner, the latter "breaks" the author to ask him the question, so that the attached EDU and possibly the continuous text can become the answer to the question. The question should be about an entity from the core component, but the adjunct component does not contain an answer to the question. The person who presented the question only interrupts the text author if his question sounds appropriate; this need not be required for all core-adjunct transformations.

For example, process 2200 may be used to establish a dialog from a document. At 2202, input text (e.g., a document) can be broken into text segments (e.g., paragraphs exemplified by text segment 2204).

At 2206, a CDT can be established for each segment of text (e.g., paragraph). For example, process 1500 may be performed (e.g., by DDT generator 104 of fig. 1) to establish a CDT for each text segment.

At 2208, an ancillary EDU list can be obtained. For example, each subordinate EDU may be identified from the CDT established at 2206. Process 2200 may include inserting query utterances before each of these ancillary EDUs. To this end, we treat each such attached EDU as an answer and attempt to formulate a question for it while generalizing it.

At 2210, entities/attributes of the subject EDU may be selected as a focus of the problem based at least in part on a set of predefined rules. For example, CDT nodes for nouns, verbs, and adjectives may be selected. A set of one or more questions may be generated based on the selected nouns, verbs, and/or adjectives. For example, for each selected node, a pruning of the parse tree is formed by removing that node, and a question is built for that pruning by replacing the removed node with the "Wh" word (e.g., who, what, where, when, or why). The Wh word may be selected for the vacancy according to the following rules: if the node corresponds to a noun, "Who" or "What" is selected. If the node corresponds to a verb, "What \8230, do" is selected, and if the node corresponds to an adjective, "Which \8230," or "How is" is selected.

At 2212, a predefined set of rules can be utilized to achieve an appropriate level of generalization for the problem. If the question is too broad or too specific, the fixed answer (adjunct EDU) may look unnatural.

At 2214, the question may be provided to the web as a query to translate it into a question posed by others in some cases (assuming similarities to those described in the current paragraph). In other words, a question may be confirmed, updated, or created by submitting the question as a query and mining the web results. This validation/update/create question may be inserted after the previous core EDU and before the current dependent EDU.

Specific examples are provided below.

Establishing dialogs from a tree of utterances

Considering the paragraph "\8230inthe disputed area in the Theranos survey, but At the burst of the random has structured after and the scenes to turn the excitation over the technology inter-level (But Theranos has been working behind to make it realistic) At the end of 2014, the lab induced as the linked with structured after and a small fraction of the tests, the laboratory instruments developed as key strategies in the end of 2014, which are then sold to consumers. "

To convert the paragraph to a dialog, a CDT can be established for the paragraph and used to generate questions for each adjunct component:

but Theranos tried behind the scenes \8230

- (question) why effort?

8230to turn the technical excitement into reality. At the end of 2014, laboratory instruments were developed as \8230

- \8230whatis the role of (problem) instrument development?

The strategic key is that only a small portion of the test is processed and then sold to consumers \8230

Who said (question)?

8230recommended by four former employees.

FIG. 23 illustrates a communication utterance tree (CDT) 2300 that can be used to generate a conversation instance, in accordance with at least one embodiment. The CDT 2300 can represent the text "\8230"; but bus has structured band and the scenes to turn the experience over technology inter-level (But Theranos has been struggling behind to make it realistic). At the end of 2014, the lab inserted as the linkage pin of the strand handled as a small fraction of the tests, and the recording to the four in the factory for the experiments. "the CDT 2300 may be generated using the process 1500 of FIG. 15 (e.g., by the DDT generator 104 of FIG. 1). Once the text is broken down into EDUs, certain associated adjunct components can be used to form questions. Declarative associations can be identified and used to form a What-question from verb phrases. The context association can be used to form an additional What-question for the corresponding accessory component '\8230as < predicate > - < subject >'. Finally, the attribution associations may be used to form a "What/who is source" problem from the corresponding adjunct components.

A simple way to generate a question is to convert each attached EDU into a question. But this method would make it too specific and unnatural, such as? (who is the key to its strategy, but a small fraction of the tests are processed and then sold? (what its policy handles. For string (T, instance) in addition to formulating associations, a problem that forms may be' Was there an instance about outout [ Theranos ] technology? (is there excitement with the [ Theranos ] technique? (whether Theranos performed only a small fraction of tests'

Build problems

The candidate problem may be reduced to avoid being overly specific. For example,' What is a British rock band that is used for in London in 1970 and received Grammy Hall of Fame Award in 2004? (What is the British rock band established in London in 1970 and won the greige celebrity house prize in 2004) 'is too specific and should be reduced to, for example,' What is the British rock band in London. To achieve an appropriate level of generalization of the problem according to the operation at 2212, an expanded set of problems (e.g., a Stanford Q/A database (SQuAD)) can be obtained, and pair-wise syntactic generalization can be performed to obtain a set of most common problem templates. The SQuAD corpus is a machine-understood dataset consisting of over 100,000 question-answer pairs from the masses in five hundred Wikipedia articles. For example, generalizing the word is the pitch of life on Earth ' and the word me the pitch of complex numbers ' results in the word-DT pitch-NN of PRP-NP ' with the part-of-speech tags retained. The most common summary results (problem templates) can be collected.

The phrase pruning rules may be applied to individual phrases and sentence levels. Therefore, we want to derive a problem from the original accessory EDU expression that is as close to the problem template as possible. Thus, for each attached EDU expression, we traverse the templates and find the most similar one. In syntactic generalization, the most similar template is the template that generates the largest common child parse tree using the expression. For the sentence "[ I Build a bridge (I Build a bridge)] _nucleus [ with the purpose of fast access to the forest)] _satellite ", the subordinate EDU is better covered by the template in our previous paragraph, rather than, for example, ` access-NN TO-TO forest-NN ` or ` access-NN TO-TO NP ` regarding the number of common terms (parse tree nodes) of the generalized results.

To increase the significance, interest, and variety of emerging and generalized problems, information mined from the web may be utilized. DDT generator 104 may form a web search query based on the question. DDT generator 104 may identify expressions from web documents or also from reputable or popular sources (e.g., a predefined set of sources) that are as close to the problem as possible. DDT generator 104 may traverse web search result scores document titles, segment sentences, and other expressions found in web documents based on identifying semantic similarity to the query. The evaluation of semantic similarity is based on generating a syntactic generalization score between the candidate query and the search results. If such an expression is found from a file, its entity needs to be replaced with the expression in the original question. As a result, candidate questions will appear more popular, more mature, and more versatile.

To verify that a formed and modified question obtained from the subject EDU text has that text as a good answer, the DDT generator 104 may utilize the entire original text and formed question and verify that the answer is the EDU forming the question, and that the answer does not correspond to another EDU. If the question is generalized or web-mining is distorted in nature, erroneous text snippets may appear as answers.

DDT generator 104 may insert nodes in CDT 2300 to represent candidate problems. The node representing the problem may be inserted after the previous core EDU and before the current dependent EDU forming the problem.

Nodes

2304, 2306, and 2308 are examples of nodes inserted into the CDT 2300 to represent issues generated from corresponding subordinate EDUs (corresponding to

nodes

2310, 2312, and 2314, respectively). Thereafter, a conversational utterance tree may be generated from the dialog 2302 in a manner described below in connection with fig. 24-30.

Returning to FIG. 21, the operations discussed in conjunction with FIG. 22 and FIG. 23 may be performed any suitable number of times on any suitable set of documents to generate the training data set discussed at 2104. The training data set may include any suitable number of DDTs, each representing a dialog instance. Each generated DDT may be associated with a tag indicating that the DDT is valid. At 2106, the training data set generated in the manner discussed in connection with fig. 22 and 23 may be used to train the conversation classifier 122 of fig. 1, which classifies DDTs representing conversation instances as valid or invalid using any suitable supervised or unsupervised machine learning algorithm. Although the operations of 2104 and 2106 are described as being performed after obtaining dialog text at 2102, the operations may alternatively be performed as pre-processing work prior to performing process 2100.

At 2108, an utterance characterization of the conversation can be generated (e.g., text 128). The utterance characterization of the dialog instance may include a dialog utterance tree (DDT). A conversation instance may be generated by merging any suitable amount of speech between the parties. For example, speech between the chat bot and the user of text 128 can be merged to form a conversation instance.

Fig. 24 illustrates an exemplary utterance tree 2400 generated from text 128 (e.g., a conversation instance, a conversation formed by merging multiple utterances between a chat robot and a user), in accordance with at least one embodiment. Here, the utterance tree 2400 can include default modifier associations generated by the utterance parser. The utterance tree 2400 can be generated using a conventional utterance parser, as described above in connection with fig. 2-7.

To express a conversation via a DT, a specific association between utterances may be added. To link the words in the conversation, thesaurus associations and communication actions can be used, as both regulate the consistency between requests and responses (questions and answers, sharing intent and acceptance thereof, etc.).

Fig. 25 illustrates an exemplary conversational utterance tree 2500 generated from the utterance tree 2400 of fig. 24, in accordance with at least one embodiment. In some embodiments, dialog-specific thesaurus associations may be used to identify relationships between questions and answers. In some embodiments, DDT generator 104 may be configured to parse utterance tree 2400 to identify particular associations to be replaced by utterance-specific modifier associations based at least in part on a predefined set of rules. These utterance-specific thesaurus associations may express conversational turns between utterances, while the remaining thesaurus associations initially provided by the utterance tree 2400 may express associations between EDUs of utterances. Speech containing multiple phrases can be split into basic utterance units (EDUs). Representing a dialog as a tree helps to establish the logical flow of the dialog and to encode abrupt changes therein. The default dialog flow may be encoded by a presentation and connection, similar to regular text, rather than a dialog: elaboration means that the recipient will obtain further information during the course of the conversation. Connected means a concatenation of two utterances.

Verbal behavior and communication actions

There are more cognitive states in a dialog than QnA alone. Some dialog-specific thesaurus associations linking speech and speech behavior are provided below.

For conventional text, it is very important to incorporate verbal behavior into the utterance representation. This is done by Communication DT (CDT), where the communication action is the label of the terminal DT arc. The role of the CDT to extend the DT through the alternate actions is more important for representing the dialog. In DDT, there are communication actions in the speech that are encoded by the tags, and communication actions between the speech that form additional conversation-specific associations between the speech. The latter may be considered a meta communication action with respect to the former object level communication action.

The DT of a conversation can be augmented by some of the communication actions mentioned in the speech.

Fig. 26 illustrates an exemplary dialog utterance tree (DDT) 2600 generated for a dialog 2602 in accordance with at least one embodiment. A traditional utterance parser can be used to generate an utterance tree for the dialog 2602. The DDT can be generated from the utterance tree based at least in part on identifying a particular lexical association (e.g., a exposition, a connection, etc.) within the utterance tree. For example, the root and higher level association declarations, and connections of the DTs presented at 2604, 2606, and 2608, respectively, may be identified. Once identified, the corresponding dialog-specific modifier associations can be identified. In this example, the elaboration, and connection of DTs as presented at 2604, 2606, and 2608 may be modified to Continue _ topic, qnA, and confirm, respectively. This may be performed to distinguish a set of words as plain text from a set of "structured" words as a dialog. Continue _ topic, qnA, and acknowledgement summary dialog structures and their overall logic.

The communication action in this dialog can be recognized between the utterance Continue _ topic and the confirmation as well as inside the utterance, as indicated in italics following some association in fig. 26. The former constitutes a meta-level communication action with respect to the latter at the object level.

Dialogue with doubt

We consider a simple conversation in which answer recipient a does not accept the answer for Q and expresses a suspicion (disagreement, not trust, reject):

q: [ tissue 1] what to do?

A: [ do B ]. [ Default answer ]

Q: is it true?

A: indeed, it is often [ as B ].

The resulting DDT includes a dialog specific untrusted association, qna, a default dialog fix association.

Wechat

qna

Text: [ tissue 1] what to do?

Text: [ as B ].

Yield trust (A, do not trust (Q), qna (A)

Text: is true?

Text: indeed, it does [ do B ].

Three speech candidates that retain the appropriate entities and emotions but disrupt the thesaurus stream are provided below. Comments are placed within square brackets:

1. it also [ do C ] [ do not solve the question ]

2.It is in the field of 'making D', 'Do not solve suspicion, using unexpected positive emotions')

3. It participates [ doing E ] [ different entities, hence inappropriate entity flow ]

The correct surrogate speech should be:

1. good bar, not always [ do C ]]IsResponsibility (es)[ attenuation of first answer]

2. And others as wellResponsibility[ attenuation of first answer]

3. You are right. It is entirely negative [ confirmation suspicion ]

Further extending thesaurus associations to dialogs

Task-oriented dialog is a complex activity involving two participants, each with their own beliefs and intentions, working together to interactively solve problems. There is a whole set of behaviors related to maintaining beliefs not in collaboration and synchronization monologue. Including answering questions, agreeing to offers, and simply acknowledging that the other participants have spoken. Examples of such dialogs are provided below.

User- "We have to send album to the emergency area 1" (We must send the ambulance to the emergency area 1.) "

User- "the re are peoples There to the evacuate (where someone needs to be evacuated.)"

Robot- "How is the horse we sensing? (how much we want to send?

User- "three. (three.)".

Robot "OK (good)"

Robot- "So 1 ambiance to the environmental area E1 and 2 ambiances to the environmental area E2? (then, 1 ambulance driven to the emergency area E1,2 ambulances driven to the emergency area E2?

Additional dialog-specific associations are provided below:

improved conversational speech parser

Certain aspects relate to improved utterance parsers. Predicting the lexicographic associations between two sentences is the goal of utterance parsing and text splitting (splitting a sentence into basic utterance units). While a document can be analyzed as a sequence of hierarchical utterance structures, a problem with utterance continuity is how the lexical associations are represented by the source text (and can be recognized by the parser). For example, the thesaurus associations are often represented by verbal labels such as and, because, however, and while, sometimes classified as explicit associations if they contain such labels. The utterance labels are reliable signals of coherent associations.

Existing utterance parsers use machine learning methods and time-consuming datasets that can be difficult to expand and extend. The task of utterance correlation prediction is already complex by nature, and the time consuming nature of expanding these labeled datasets makes this task even more complex. Thus, many available utterance resolvers assign utterance and connection associations where other more descriptive utterance associations are more appropriate. Thus, the recall rate for establishing other more specific thesaurus associations may be relatively low.

Existing utterance parsers can be improved by performing additional analysis (e.g., semantic analysis) on the utterance tree and adapting the utterance tree accordingly. In the examples discussed below, the utterance parser is applied to the text, and any resulting utterance or connection-related modifier associations may be replaced, if available, by more appropriate modifier associations obtained by using a semantic analysis (e.g., abstract meaning representation) schema. Generally, this method is suitable for the retrieval association in sentences. The techniques described in connection with fig. 25 and 26 may be similarly applied to DDTs generated from a dialog.

FIG. 27 depicts a utterance tree and a semantic tree in accordance with an aspect. Fig. 27 depicts a speech tree 2700 and a semantic tree 2710.

Utterance tree 2700 and semantic tree 2710 each represent the following text: "Itwa a queue of life or floor for I had research on water and water to last a week" (which is a vital issue to I am that I hardly has enough drinking water to last a week.) ".

Utterance tree 2710 is represented in text-based form as follows (indentation refers to the level of nesting in the tree):

description of the invention

Text: itws a queue of life or death for me:

description of the preferred embodiments

Text: i had searching by outer driving water

Text: to last a week.

As can be seen from the utterance tree 2710, the second interpretation association 2712 generated by the utterance parser in relation to "I had coronary empty driving water" and "to last a week" is not so accurate in terms of text, since "to last a week" is not merely an interpretation of "I had coronary artery empty driving water". This can be improved by utilizing AMR associations (e.g., semantic associations) in semantic tree 2720. Semantic tree 2710 is also shown below in text-based form:

as can be seen in semantic tree 2720, the semantic association purpose identified as association 2722 has a semantic role that is related to verb drink identified as role 2724. A core EDU ("I had systematic by outer training water") with a drink in the utterance tree 2700 can be identified because the utterance tree 2710 and the semantic tree 2720 have a common entity (e.g., both include the verb "drink").

The higher the number of common entities between the utterance tree and the semantic tree template, the higher the degree of matching for improving the thesaurus association. Continuing with the example, an accessory EDU ("to last a week") is identified and associated with a thesaurus association statement. Finally, the exposition is replaced by the purpose to obtain a more accurate utterance tree. The link between the illustration and the purpose is shown as link 2730.

In some cases, the conversational speech tree may be improved with semantic information, for example, when a particular modifier association is implied by the absence of a speech tag, a speech tag that is ambiguous or misleading, or a deeper semantic representation of the sentence (such as AMR). Once syntactic similarities are established between parsed text and AMR schemas, semantic roles from AMR verbs can be interpreted at the utterance level as corresponding lexical associations. This mapping association between semantic associations and specific modifier associations in AMR is established independently of the way in which the connection components, core components and subordinate EDUs are connected.

As a result of the manual generalization of the available AMR annotations, a mapping between semantic associations (of AMR) and lexical associations was developed, as shown in the following table. The table provided below illustrates an example of semantic roles and corresponding lexical associations. The first column of the table lists the modifier associations to be detected. The second list represents AMR semantic associations that map to the retrieval associations. The third column provides an illustrative sentence that is to be matched again with the sentence being pruned. The fourth column shows the AMR resolution of the template.

To create such a mapping of the prune associations that can be performed offline (e.g., before runtime) to semantic roles, a list of prune associations is considered. For each of the modifier associations, a set of AMR labels for the particular semantic association can be determined. Once system dependencies are identified, a corresponding mapping is created, represented by the entries in the table below. The following table illustrates the lexical associations that are thoroughly represented by the AMR example.

The following table provides an example of a refined utterance tree in which the exposition becomes a more specific association. The template is built and refined. The template shows the detected modifier associations in bold. The second example shows an actual refinement where the explanation is turned to yield by applying the template starting from the bottom second row. Syntactic generalization between the template and sentence is also shown.

In order to replace the formulated modifier associations with modifier associations obtained from AMR, syntactic similarities are established between the formulated core and subordinate base speech units and the semantic tree of templates (which are also referred to herein as "templates" for brevity). If such similarity is high (e.g., determined by a syntactic similarity score generated by syntactic generalization), the statement may be rewritten with high confidence. The higher the syntactic similarity score, the higher the confidence that the semantic role derived from the schema accurately describes the lexical associations. Formal learning of such mapping is difficult without adequate extensive mapping of AMR mode data and to fix-up associated data. Therefore, a threshold value for the similarity score is used.

Another example of using semantic associations and roles to improve thesaurus associations is shown in fig. 28.

FIG. 28 depicts a utterance tree and a semantic tree in accordance with an aspect. FIG. 28 depicts utterance tree 2810 and semantic tree 2820. Utterance tree 2810 represents the text "I ate the most powerful hamburger that she had an even bought for me". Semantic tree 2820 does not represent the same text as utterance tree 2810. In contrast, semantic tree 2820 represents template text that is a proper match to the text of utterance tree 2810 and can be used to refine utterance tree 2810.

Utterance tree 2810 is represented in the following form:

description of the invention

Text: i ate the most of the wonderful hamburger (I have eaten the very best hamburger)

Text: that she had ever developer bought for me.

As can be seen from utterance tree 2810, the two basic utterance units "ate the most contextual speaker" and "that she had an even bought for me" are connected by a "formulation association". Thus, the utterance tree 2810 is a good candidate for improvement because "elucidation" may not be the most accurate thesaurus association.

The AMR semantic role of the matched-to maps to the compared modifier associations. If an EDU with a default modifier association is semantically similar to a template (an AMR representation of predefined text) with a particular semantic association that can be mapped to the modifier association, default utterance parsing provides a formulation that can be translated into a more accurate modifier association. In order to establish an accurate thesaurus relationship between EDUs in a sentence, an attempt is made to match templates found in a set of templates (e.g., an AMR repository). The matched template is for the sentence "It was the most popular and solemy planet that he saw even the search".

To match the parsed EDU pair with a template, the EDU and template may be aligned and generalized. In this case, the syntax between EDU pairs and templates is generalized as follows: [ VB-. DT-the RBS-most JJ- (wonderful ^ major) IN-that PRP-she VB-had RB-ever VB- ], so that there is significant evidence that parsed sentences and patterns share a common syntactic structure. For example, wonderful ^ lignicent produces abstract adjectives whose meaning represents the common point between these adjectives. Connection 2630 shows the correspondence between the adjective magnicent in the AMR representation and the adjective wonderful in the original DT.

Thus, the exposition in the utterance tree 2800 is replaced with a "comparison" type of lexical association. The corrected utterance tree is as follows:

comparison of

Text: i ate the most of the wonderful hamburger

Text: the hat she had ever bought for me.

On the basis of the above example, the process of improving the utterance tree is further described.

Fig. 29 is a flow diagram of an exemplary process 2900 for generating a dialog utterance tree, according to an aspect. It is to be understood that in some cases, one or more operations in process 2900 may not be performed. Process 2900 may be performed by DDT generator 104 of fig. 1.

At 2902, process 2900 may include obtaining text that includes a conversation. A conversation may include any speech or speech of a conversation between two parties (e.g., a user and a chat robot). In some embodiments, a dialog may be formed based on merging multiple utterances of a conversation.

At block 2904, process 2900 involves creating a utterance tree (e.g., utterance tree 2400 of fig. 24) from text (e.g., text 128) by recognizing basic utterance units in the text. At block 2904, process 2900 involves substantially similar operations as block 1502 of process 1500. The determined utterance tree (e.g., utterance tree 2400) includes nodes. Each of the non-terminating nodes represents a lexicographic relationship between two base utterance units, and each of the terminating nodes of the utterance tree is associated with a base utterance unit.

At block 2906, process 2900 involves identifying lexical associations of the type set forth or connected in the utterance tree. The lexicographic associations relate to two basic speech units, e.g., a first basic speech unit and a second basic speech unit (rather than to two other lexicographic associations or one lexicographic association and one basic speech unit).

The first and second base utterance units together form a reference sentence. For example, referring back to FIG. 28, the first EDU is "I ate the most Wonderful Hamburger" and the second EDU is "that she had even bolt for me" and the lexical association (before update) is "illustrative".

At block 2908, process 2900 involves determining a syntactic generalization score for each candidate sentence in a set of candidate sentences. As described above, each candidate sentence has a corresponding semantic association (e.g., an AMR representation). In a simplified example, the syntactic generalization score is the number of common entities between the reference sentence and the candidate sentence of DT. Each of these common entities shares a common part of speech between the candidate sentence and the reference sentence. The syntactic generalization score can be calculated in other ways, as described below.

The goal of abstract generalization is to find commonalities between parts of text at different semantic levels. Generalization can be done at the level of paragraphs, sentences, EDUs, phrases, and single words. In addition to the word level, the generalization result of the two expressions is a set of expressions. In such a set, for each pair of expressions, if one expression is less generalized than the other expression, the latter is eliminated. The generalization of two expression sets is a set of multiple expression sets that is the result of the pair-wise generalization of these expressions. For example, fig. 29 is discussed with respect to fig. 30 illustrating generalization and fig. 31 illustrating alignment.

FIG. 30 depicts generalization of sentences and templates with known semantic associations according to an aspect. FIG. 30 shows a sentence generalization 3010 of "If you read a book at night, your knowledge will improve)" and a template 3020 of "If one gets lost in the right, such knowledge is valid" (If one gets lost at night, such knowledge is valuable). The resulting generalization 3030 was as follows:

[ IN-IfPRP-. VB-. NN-night.. NN-knowledge ] although IN this template IN-If PRP-. VB-. Is: the semantic association of Condition () and the signature of the utterance association of Condition, but there are just some more common words such as "NN-right.

To determine how to compute the appropriate generalization score, the problem can be formulated to find the best weight for nouns, adjectives, verbs, and their forms (e.g., vernouns and past times) so that the relevance of the search results is maximized. Search relevance may be measured as the deviation of the search result order from the best results for a given query; the current search order may be determined based on the generalization score for a given set of POS weights (fixing other generalization parameters). As a result of performing this optimization, W is obtained _NN ＝ 1.0，W _JJ ＝0.32，W _RB ＝0.71，W _CD ＝0.64，W _VB ＝0.83，W _PRP =0.35, not including common frequent verbs such as get, take, set and put, for which W is _VBcommon ＝0.57。W _<POS，*> Is set to 0.2 (different words but same POS), and W _<*，word> =0.3 (same word but appearing as different POS in two sentences). W is a group of _{{and，as，but，while，however，because}} Is calculated as a normalized default value of 1.

The generalization score between the reference sentence (ref _ sense) and the candidate Template (Template) may be represented as the word Wword _{ref_sentence} And word _template Sum of phrases of weighted sum of (1):

score(ref_sentence，template)＝∑ _{{NP，VP，...}} ∑W _POS word_generalization(word _{ref_sentence} ， word _template )。

the maximum generalization can then be defined as the generalization with the highest score.

At the phrase level, generalization begins with finding the alignment between two phrases (as many word correspondences as possible between the two phrases). An alignment operation is performed to maintain phrase integrity. For example, two phrases can only be aligned if a correspondence between their central nouns is established. Similar integrity constraints exist for alignment of verbs, prepositions, and other types of phrases.

FIG. 31 depicts alignment between two sentences according to an aspect. Fig. 31 depicts the alignment between the sentence 3110"use the Screwdriver from the tools for fixing heaters" and the sentence 3120"get short Screwdriver for electric heaters". The resulting alignment 3130 is as follows:

VB-*JJ-*NN-zoom NN-*IN-for NN-*

in an aspect, a conversational utterance tree may be generated using separate generalization of the core and adjunct components. For example, a speech tree is created in block 2904 of process 2900. A thesaurus association is identified from the utterance tree. An appropriate thesaurus association is determined. Examples of suitable lexicographic associations include an innermost association set forth and connected, and a nested association (over and over engagement (set forth for another set forth).

The core EDU and subsidiary EDUs are identified. If they are too complex or too long, the size and/or complexity of these EDUs may be reduced. The core EDU was generalized with each template from the template table provided above. The candidate sentence with the highest generalization score is selected. If the score is above a threshold, the attached EDU corresponding to the thesaurus association is generalized with a template. If the generalization score of the subject EDU is above a threshold, the modifier association is used instead of the modifier association in the reference sentence.

Returning to fig. 29, at block 2910, the process 2900 involves selecting the candidate sentence with the highest syntactic generalization score among the syntactic generalization scores.

In one aspect, no match is found. For example, DDT generator 104 searches the Abstract Meaning Representation (AMR) dataset to identify that the identified semantic association is not in the AMR dataset, and then replaces the pruned association with an additional semantic association in the AMR dataset in a speech tree (e.g., speech tree 2400).

At block 2912, process 2900 involves identifying semantic associations that correspond to the candidate sentences. The semantic associations correspond to words in the candidate sentence and define roles in the candidate sentence.

At block 2914, the process 2900 involves replacing, in the utterance tree 2400, the modifier associations with updated modifier associations that correspond to the semantic associations, thereby creating a conversational utterance tree (e.g., the conversational utterance tree 2500 of fig. 25). The pruned associations that match the semantic associations identified in block 2912 are identified. The identified modifier associations are inserted into the utterance tree in place of the modifier associations identified at block 2906.

Returning to fig. 21. At 2110, process 2100 involves traversing (iterating) a plurality of candidate utterances. The candidate speech may be identified by the DDT processing module 106 from the answer database 105 of fig. 1. For example, the answer 132 of FIG. 1 may be an example of a speech candidate that may be identified based at least in part on the text 128 of FIG. 1.

At 2112, a conversational utterance tree may be generated for each candidate speech. The DDT may represent the current conversation (e.g., text 128) plus candidate speech (e.g., answer 132).

Fig. 32 is a flow diagram depicting an exemplary process 3200 for generating a conversational utterance tree corresponding to a candidate response (e.g., answer 132 of fig. 1) in accordance with at least one embodiment. Process 3200 may be performed by DDT generator 104 of figure 1.

Process 3200 can begin at 3202, where a utterance tree (e.g., utterance tree 2400) can be established from a conversation, such as a conversation generated by merging utterances (e.g., text 128 and answer 132) into text with default management.

At 3204, speech of the speech tree (e.g., speech tree 2400) can be split and dialog-specific modifier associations (e.g., qnA can be replaced with at least one of the modifier associations of the speech tree 2400), as discussed in connection with fig. 25.

At 3206, conversation-specific, lexical associations may be identified between the utterances of the conversation. Identifying dialog-specific modifier associations may be performed as described above in connection with fig. 25 and 26.

At 3208, the conversation-specific modifier associations can be revised based on the AMR template. These revisions may be performed in the manner described in connection with fig. 26-31.

At 3210, request and response portions of the conversation may be identified in the conversation. At 3210, request and/or response lexicographic associations may be identified in the utterance tree.

At 3212, dialog-specific modifier associations corresponding to negations, doubts, confirmations, and repetitions may be identified and labeled in the utterance tree.

At 3214, DDT generator 104 may clarify any requests and/or responses.

At 3216, the DDT of the candidate response (e.g., answer 132) may be output.

Returning to FIG. 21, at 2114, the thesaurus correspondence of question/answer pairs obtained from the question and candidate answers of the text 128 may be assessed.

A sub-question of a speech driven dialog management question is how to coordinate single question-answer pairs. These considerations also cover the case of arbitrary responses and arbitrary requests, which relate to appropriateness beyond topic relevance, which is typical for question-and-answer pairs. The demonstration mode in the question needs to be reflected in the demonstration mode in the answer: the latter may contain demonstration failures or support. The irony in the question should be solved by irony or irony in the answer. The question suspicion should be answered by rejection or confirmation. Knowledge sharing intent in speech needs to be accompanied by acceptance or rejection of such knowledge in the answer. Some thought patterns expressed in questions may respond with matching thought patterns to be accepted by the social norms. The request may have any lexicographic structure as long as the recipient is aware of the subject of the request or question. The response itself may have any lexicographic structure. However, if the response is appropriate for the request, the structures should be interrelated. The computational metrics can be used to assess whether the logic, the lexicographic structure, of the request or question is consistent with the logic, the lexicographic structure, of the response or answer.

When a question in the form of a phrase or sentence is answered, the answer must be directed to the subject of the question. When a question is implicitly expressed by the seed text of a message, its answer is not only to maintain the topic, but also to match the cognitive state of the seed. For example, when someone wants to sell goods having particular characteristics, the search results should not only contain those characteristics, but also indicate the intent of the purchase. When someone wants to share knowledge about a certain item, the search results should contain an intention to receive a recommendation. When someone asks for an opinion on a certain topic, the answer should be to share the opinion on the topic, rather than another request for opinion. Modern conversation management systems and automatic email replies achieve good accuracy in maintaining topics, but maintaining communication utterances is a much more difficult problem. This lexicographic consistency measure needs to be learned from the data because it is difficult to come up with explicit rules for coordinated lexicographic structures.

To assess the lexical consistency of question/answer pairs obtained from the question and candidate answers (e.g., answer 132) of text 128, the pairs may be provided to the lexical consistency classifier 120 of FIG. 1. Using the trained thesaurus consistency classifier 120, the ddt processing module 106 determines whether the pairing of a candidate answer (e.g., answer 132, "are, have you wave using the button stroke before. If the candidate answer (e.g., answer 132) is deemed relevant to the topic, the DDT processing module 106 may proceed to 2116. If the candidate answer is deemed not relevant to the topic. The candidate answer may be rejected and the DDT processing module 106 may proceed to 2110 of process 2100 to process the next candidate utterance.

At 2116, the process 2100 involves classifying the DDTs generated for each candidate utterance. In some embodiments, the DDT processing module 106 may classify a question/answer pair of a candidate response as being related to a topic and, if so, continue to classify the DDT corresponding to the candidate response, if any. In some embodiments, the DDT may classify various question/answer pairs for a plurality of candidate responses. For question/answer pairs determined to be related to a topic, the corresponding DDT may be classified. To classify a DDT, DDT processing module 106 may provide the DDT generated (e.g., by DDT generator 104) for each candidate response (and in some cases, for each topic-related candidate response) to conversation classifier 122 of fig. 1.The dialog classifier 122 may be trained in the manner discussed in connection with fig. 22 and 23 to classify DDTs as being valid (e.g., to preserve the proper flow of modifiers between utterances) or invalid (e.g., to break the proper flow of modifiers between utterances).

If the response candidate is deemed relevant to the topic at 2114 and the corresponding DDT is deemed valid at 2116, the application 102 of fig. 1 can accept the speech candidate and can respond with the accepted speech at 2118. That is, the accepted speech (e.g., answer 132) may be provided as a response and added to the dialog 130 of fig. 1.

Fig. 33 is a flow diagram depicting an exemplary process 3300 for managing conversations with DDTs in accordance with at least one embodiment. In some embodiments, method 3300 may be performed by computing device 101 of fig. 1 (e.g., application 102, DDT generator 104, DDT processing module 106, etc.). The operations of method 3300 may be performed in any suitable order. In some embodiments, method 3300 may include more operations than depicted in fig. 33 or fewer operations than depicted in fig. 33.

Method 3300 can begin at 3302, where there is a request from a user device that includes speech of a conversation between two entities. For example, the speech "Can my first lesson be on the butterfly stroke? (can my first lesson be butterfly.

At 3304, a conversation instance can be generated based at least in part on merging the plurality of utterances previously provided by either of the two entities. In some embodiments, the plurality of utterances comprises a requested utterance. By way of non-limiting example, the speech of the conversation 130 that has been received may be combined to form the text 128 of FIG. 1.

At 3306, a set of candidate responses to the requested utterance may be identified from the candidate response corpus. In some embodiments, the corpus may be stored in answer database 105 of FIG. 1. Identifying these candidate responses may be performed in any suitable manner as discussed in the above-mentioned figures with respect to matching questions to answers and/or matching requests to replies.

At 3308, a dialog utterance tree for the dialog instance and the candidate response can be generated. In some embodiments, the conversational utterance tree includes nodes that correspond to a base utterance unit that represents a plurality of utterances and text snippets of a candidate response. At least one non-terminating node of the nodes of the conversational utterance tree may represent a lexical relationship between two base utterance units, and each terminating node of the nodes of the conversational utterance tree may be associated with a base utterance unit. In some embodiments, the conversational utterance tree includes at least one node that represents a conversational-specific lexicographic relationship between two utterances of a conversational instance. Fig. 25 and 26 depict example conversational utterance trees, respectively.

At 3310, the conversational utterance tree of candidate responses may be classified (e.g., valid or invalid) using a first machine learning model. In some embodiments, the first machine learning model may be previously trained using supervised learning techniques and a training data set that includes a plurality of conversational utterance trees previously labeled as valid or invalid.

At 3312, a candidate response may be provided in response to the request based at least in part on classifying the conversational utterance tree of the candidate response. For example, if the conversational utterance tree of a candidate response is classified as valid (e.g., maintaining an appropriate lexicographic flow between utterances), the candidate response may be provided in response to the request.

Exemplary System

FIG. 34 depicts a simplified diagram of a distributed system 3400 for implementing one of these aspects. In the illustrated aspect, the distributed system 3400 includes one or more

client computing devices

3402, 3404, 3406, and 3408 configured to execute and operate client applications, such as web browsers, proprietary clients (e.g., oracle Forms), and the like, over one or more networks 3410. Server 3412 may be communicatively coupled to remote

client computing devices

3402, 3404, 3406, and 3408 via network 3410.

In various aspects, the server 3412 may be adapted to run one or more services or software applications provided by one or more components of the system. Services or software applications may include non-virtual and virtual environments. Virtual environments may include environments for virtual events, trade shows, simulators, classrooms, shopping transactions, and businesses, whether they be two-dimensional or three-dimensional (3D) representations, page-based logical environments, or other environments. In some aspects, these services may be provided to users of

client computing devices

3402, 3404, 3406, and/or 3408 as web-based services or cloud services or under a software as a service (SaaS) model. Users operating

client computing devices

3402, 3404, 3406, and/or 3408, in turn, may utilize one or more client applications to interact with server 3412 to take advantage of the services provided by these components.

In the configuration depicted in the figure, software components 3418, 3420, and 3422 of system 3400 are shown as being implemented on server 812. In other aspects, one or more of the components of system 3400 and/or the services provided by these components may also be implemented by one or more of

client computing devices

3402, 3404, 3406, and/or 3408. A user operating the client computing device may then utilize the services provided by these components with one or more client applications. These components may be implemented in hardware, firmware, software, or a combination thereof. It is to be appreciated that a variety of different system configurations are possible that can differ from the distributed system 3400. Thus, the aspect illustrated in the figures is one example of a distributed system for implementing the system of an aspect and is not intended to be limiting.

Client computing devices

3402, 3404, 3406, and/or 3408 may be portable handheld devices (e.g.,

a cellular phone,

Computing tablet computers, personal Digital Assistants (PDAs)) or wearable devices (e.g., google)

Head mounted display), such as Microsoft Windows

Etc. running software and/or various mobile operating systems (e.g., iOS, windows phone, android, blackBerry (BlackBerry) 10, palm OS, etc.) and are internet, email, short Message Service (SMS), etc,

Or support other communication protocols. The client computing device may be a general purpose personal computer including, by way of example, running versions of Microsoft Windows

Apple

And/or a personal computer and/or a laptop computer of a Linux operating system. The client computing device may be running a variety of commercially available

Or any of the UNIX-like operating systems, including but not limited to various GNU/Linux operating systems, such as the Google Chrome OS. Alternatively or additionally, the

client computing devices

3402, 3404, 3406, and 3408 may be any other electronic device capable of communicating over the network(s) 3410, such as a thin-client computer, an internet-enabled gaming system (e.g., with or without a network-enabled gaming system)

Microsoft (Microsoft) Xbox game console for gesture input devices) and/or personal messaging devices.

Although exemplary distributed system 3400 is shown with four client computing devices, any number of client computing devices may be supported. Other devices (e.g., devices with sensors, etc.) may interact with server 3412.

The network(s) 3410 in the distributed system 3400 may be any type of network familiar to those skilled in the art that may support data communications using any of a variety of commercially available protocols including, but not limited to, TCP/IP (transmission control protocol/internet protocol), SNA (system network architecture), IPX (internet packet exchange), appleTalk, and the like. By way of example only, the network(s) 3410 may be a Local Area Network (LAN), such as an ethernet (LAN), token ring, etc., based local area network. The network(s) 3410 may be a wide area network and the internet. It may include virtual networks including, but not limited to, virtual Private Networks (VPNs), intranets, extranets, public Switched Telephone Networks (PSTNs), infrared networks, wireless networks (e.g., according to the Institute of Electrical and Electronics Engineers (IEEE) 802.28 protocol suite, the internet, etc.),

And/or any other wireless protocol); and/or any combination of these and/or other networks.

Server 3412 may be comprised of: one or more general purpose computers, special purpose server computers (including by way of example PC (personal computer) servers, and the like,

Servers, midrange servers, mainframe computers, rack servers, etc.), server farms, server clusters, or any other suitable arrangement and/or combination. The servers 3412 may include one or more virtual machines running a virtual operating system or other computing architecture involving virtualization. May be to one of the logical storage devicesOr multiple flexible pools, to maintain virtual storage for the servers. The virtual networks may be controlled by the servers 3412 using software-defined networking. In various aspects, the server 3412 may be adapted to run one or more services or software applications described in the foregoing disclosure. For example, the server 3412 may correspond to a server for performing the above-described processing according to an aspect of the present disclosure.

The server 3412 can run an operating system including any of those discussed above, as well as any commercially available server operating system. Server 3412 may also run any of a variety of additional server applications and/or intermediary applications, including HTTP (HyperText transport protocol) servers, FTP (File transfer protocol) servers, CGI (common gateway interface) servers,

Servers, database servers, etc. Exemplary database servers include, but are not limited to, those commercially available from oracle corporation, microsoft corporation, sybase corporation, IBM corporation, and the like.

In some implementations, the server 3412 can include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the client computing devices 802, 804, 806, and 808. As an example, data feeds and/or event updates may include, but are not limited to

Feeding,

Real-time updates received from one or more third-party information sources and continuous data streams, which may include real-time events related to sensor data applications, financial tickers, network performance measurement tools (e.g., network monitoring and traffic management applications), click stream analysis tools, automotive traffic monitoring, and the like. The servers 3412 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of the

client computing devices

3402, 3404, 3406, and 3408.

The distributed system 3400 may also include one or more databases 3414 and 3416. Databases 3414 and 3416 may reside in various locations. For example, one or more of databases 3414 and 3416 may reside on (and/or reside in) non-transitory storage media local to servers 3412. Alternatively, databases 3414 and 3416 may be remote from servers 3412 and in communication with servers 3412 via a network-based or dedicated connection. In one set of aspects, the databases 3414 and 3416 may reside in a Storage Area Network (SAN). Similarly, any necessary files for performing the functions attributed to server 3412 can be stored locally on server 3412 and/or remotely as the case may be. In one set of aspects, databases 3414 and 3416 may include relational databases, such as those provided by oracle corporation, adapted to store, update, and retrieve data in response to SQL-formatted commands.

Fig. 35 is a simplified block diagram of one or more components of a system environment 3500 through which services provided by one or more components of an aspect system may be provided as cloud services in accordance with an aspect of the present disclosure. In the illustrated aspect, system environment 3500 includes one or more

client computing devices

3504, 3506, and 3508, which may be used by a user to interact with a cloud infrastructure system 3502 that provides cloud services. The client computing device may be configured to operate a client application, such as a web browser, a proprietary client application (e.g., oracle form), or some other application, which may be used by a user of the client computing device to interact with cloud infrastructure system 3502 to use services provided by cloud infrastructure system 3502.

It should be appreciated that the cloud infrastructure system 3502 depicted in the figure can have other components in addition to those depicted. Further, the aspects shown in the figures are only one example of a cloud infrastructure system that may incorporate aspects of the present invention. In some other aspects, cloud infrastructure system 3502 can have more or fewer components than shown in the figures, can combine two or more components, or can have a different configuration or arrangement of components.

Client computing devices

3504, 3506, and 3508 may be devices similar to those described above for 3402, 3404, 3406, and 3408.

Although exemplary system environment 3500 is shown with three client computing devices, any number of client computing devices may be supported. Other devices (e.g., devices with sensors, etc.) can interact with cloud infrastructure system 3502.

Network(s) 3510 may facilitate data communication and exchange between

clients

3504, 3506, and 3508 and cloud infrastructure system 3502. Each network may be any type of network familiar to those skilled in the art that can support data communications using any of a variety of commercially available protocols, including those described above for network(s) 3410.

The cloud infrastructure system 3502 can include one or more computers and/or servers, which can include those described above for the server 3429.

In certain aspects, the services provided by the cloud infrastructure system may include a large number of services provided on demand to users of the cloud infrastructure system, such as online data storage and backup solutions, web-based email services, hosted office suites and document collaboration services, database processing, management technology support services, and the like. The services provided by the cloud infrastructure system can be dynamically extended to meet the needs of its users. A particular instance of a service provided by a cloud infrastructure system is referred to herein as a "service instance. Generally, any service provided from a cloud service provider's system to a user via a communication network (such as the internet) is referred to as a "cloud service". Generally, in a public cloud environment, servers and systems constituting a system of a cloud service provider are different from an indoor server and system of a customer's own. For example, a cloud service provider's system may host applications, and users may order and use applications on demand via a communication network such as the internet.

In some examples, services in a computer network cloud infrastructure may include protected computer network access to storage, hosted databases, hosted web servers, software applications, or other services provided to users by cloud providers or other services known in the art. For example, the service may include password protected access to a remote storage device on the cloud over the internet. As another example, the service may include a web services-based hosted relational database and a scripting language middleware engine dedicated to network developers. As another example, the service may include access to an email software application hosted on a website of the cloud provider.

In certain aspects, cloud infrastructure system 3502 can include a suite of applications, middleware, and database service products that are delivered to customers in a self-service, subscription-based, elastically extensible, reliable, highly available, and secure manner. An example of such a cloud infrastructure system is the oracle public cloud provided by the present assignee.

Large amounts of data (sometimes referred to as big data) can be hosted and/or manipulated by the infrastructure system at multiple levels and different scales. Such data may comprise very large and complex data sets that are difficult to process using typical database management tools or traditional data processing applications. For example, terabytes of data may be difficult to store, retrieve, and process using a personal computer or its rack-based peer device. Such large data is difficult to handle using most current relational database management systems and desktop statistics and visualization packages. It may require massively parallel processing software running thousands of server computers beyond the usual software tool architecture to capture, collate, manage, and process data over a tolerable elapsed time.

Analysts and researchers can store and manipulate very large data sets to visualize large amounts of data, detect trends, and/or otherwise interact with data. Tens, hundreds, or thousands of parallel linked processors may act on such data to render external forces on or what they render the data or analog data. These data sets may relate to structured data, such as structured data organized in a database or otherwise according to a structured model, and/or unstructured data (e.g., emails, images, data blocks (blobs), web pages, complex event processing). By taking advantage of the ability to relatively quickly focus more (or less) computing resources on a targeted aspect, the cloud infrastructure system may be better used to perform tasks on large data sets based on demand from businesses, government agencies, research organizations, private individuals, like-minded groups of individuals or organizations, or other entities.

In various aspects, the cloud infrastructure system 3502 can be adapted to automatically provision, manage, and track customer subscriptions to services provided by the cloud infrastructure system 3502. The cloud infrastructure system 3502 may provide cloud services via different deployment models. For example, the service may be provided under a public cloud model in which the cloud infrastructure system 3502 is owned by an organization that sells cloud services (e.g., owned by oracle corporation), and the service is available to general public businesses or businesses of different industries. As another example, services may be provided under a private cloud model in which cloud infrastructure system 3502 operates only for a single organization and may provide services for one or more entities within the organization. Cloud services may also be provided under a community cloud model in which the services provided by cloud infrastructure system 3502 and cloud infrastructure system 3502 are shared by several organizations in the relevant community. The cloud services may also be provided under a hybrid cloud model, which is a combination of two or more different models.

In some aspects, the services provided by the cloud infrastructure system 3502 may include one or more services provided under a software as a service (SaaS) category, a platform as a service (PaaS) category, an infrastructure as a service (IaaS) category, or other categories of services including hybrid services. A customer may order one or more services provided by cloud infrastructure system 3502 via a subscription order. The cloud infrastructure system 3502 then processes to provide the services in the customer's subscription order.

In some aspects, the services provided by the cloud infrastructure system 3502 can include, but are not limited to, application services, platform services, and infrastructure services. In some examples, the cloud infrastructure system may provide application services via a SaaS platform. The SaaS platform may be configured to provide cloud services that fall under the SaaS category. For example, a SaaS platform may provide the ability to build and deliver a suite of on-demand applications on an integrated development and deployment platform. The SaaS platform may manage and control the underlying software and infrastructure used to provide the SaaS services. By utilizing services provided by the SaaS platform, a customer may utilize applications executing on the cloud infrastructure system. The customer can obtain the application service without purchasing a separate license and support. Various different SaaS services may be provided. Examples include, but are not limited to, services that provide sales performance management, enterprise integration, and business flexibility solutions for large organizations.

In some aspects, the cloud infrastructure system may provide platform services via a PaaS platform. The PaaS platform may be configured to provide cloud services that fall under the PaaS category. Examples of platform services may include, but are not limited to, services that enable an organization (e.g., oracle corporation) to consolidate existing applications on a shared common architecture, and the ability to build new applications that utilize the shared services provided by the platform. The PaaS platform can manage and control the underlying software and infrastructure used to provide PaaS services. Customers can obtain PaaS services provided by the cloud infrastructure system without purchasing separate licenses and support. Examples of platform services include, but are not limited to, oracle Java Cloud Service (JCS), oracle database cloud service (DBCS), and the like.

By utilizing the services provided by the PaaS platform, customers can adopt programming languages and tools supported by the cloud infrastructure system and control the deployed services. In some aspects, the platform services provided by the cloud infrastructure system may include database cloud services, middleware cloud services (e.g., oracle fusion middleware services), and Java cloud services. In one aspect, a database cloud service may support a shared service deployment model that enables an organization to aggregate database resources and provide databases as a service to customers in the form of a database cloud. In the cloud infrastructure system, the middleware cloud service may provide a platform for customers to develop and deploy various business applications, and the Java cloud service may provide a platform for customers to deploy Java applications.

IaaS platforms in a cloud infrastructure system may provide a variety of different infrastructure services. Infrastructure services facilitate the management and control of underlying computing resources (e.g., storage, networks, and other underlying computing resources) by customers that utilize services provided by SaaS platforms and PaaS platforms.

In certain aspects, the cloud infrastructure system 3502 can also include infrastructure resources 3530 for providing resources for providing various services to customers of the cloud infrastructure system. In one aspect, the infrastructure resources 3530 may include a combination of hardware, such as servers, storage, and network resources, for performing pre-integration and optimization of services provided by PaaS platforms and SaaS platforms.

In some aspects, resources in cloud infrastructure system 3502 can be shared by multiple users and dynamically reallocated as needed. In addition, resources may be allocated to users in different time zones. For example, cloud infrastructure system 3530 may enable a first group of users in a first time zone to utilize the resources of the cloud infrastructure system within a specified number of hours, and then enable the same resources to be reallocated given another group of users located in a different time zone, thereby maximizing utilization of the resources.

In certain aspects, multiple internally shared services 3532 may be provided that are shared by different components or modules of the cloud infrastructure system 3502, as well as shared by services provided by the cloud infrastructure system 3502. These internal shared services may include, but are not limited to, security and identity services, aggregation services, enterprise repository services, enterprise manager services, virus scanning and whitelisting services, high availability, backup and restore services, services for implementing cloud support, email services, notification services, file transfer services, and the like.

In certain aspects, the cloud infrastructure system 3502 can provide comprehensive management of cloud services (e.g., saaS, paaS, and IaaS services) in the cloud infrastructure system. In one aspect, the cloud management functionality can include capabilities for provisioning, managing, and tracking subscriptions or the like for customers received by the cloud infrastructure system 3502.

In one aspect, as depicted in the figure, cloud management functionality may be provided by one or more modules, such as an order management module 3520, an order orchestration module 3522, an order provisioning module 3524, an order management and monitoring module 3526, and an identity management module 3528. These modules may include or be provided using one or more computers and/or servers, which may be general purpose computers, special purpose server computers, server farms, server clusters, or any other suitable arrangement and/or combination.

In an exemplary operation 3534, a customer using a client device, such as

client device

3504, 3506, or 3508, may interact with cloud infrastructure system 3502 by requesting one or more services provided by cloud infrastructure system 3502 and placing an order to subscribe to the one or more services provided by cloud infrastructure system 3502. In certain aspects, a customer may access a cloud User Interface (UI), i.e., cloud UI 3529, cloud UI 3514, and/or cloud UI 3516, and place an order subscription via these UIs. The order information received by cloud infrastructure system 3502 in response to the customer placing the order may include information identifying the customer and one or more services provided by cloud infrastructure system 3502 to which the customer intends to subscribe.

After the customer places the order, order information is received via the cloud UI 3529, 3514, and/or 3516.

At operation 3536, the order is stored in an order database 3518. Order database 3518 may be one of several databases operated by cloud infrastructure system 3518 and operated in conjunction with other system elements.

At operation 3538, the order information is forwarded to the order management module 3520. In some instances, the order management module 3520 may be configured to perform accounting functions and accounting functions related to orders, such as validating an order and ordering an order after validation.

At operation 3540, information regarding the order is communicated to an order orchestration module 3522. Order orchestration module 3522 may use the order information to orchestrate the provision of services and resources for the orders placed by the customers. In some instances, order orchestration module 3522 may use services of order provisioning module 3524 to orchestrate the provisioning of resources to support subscribed services.

In certain aspects, order orchestration module 3522 enables management of business processes associated with each order and application of business logic to determine whether the order should continue to be provisioned. At operation 3542, upon receiving an order for a new subscription, the order orchestration module 3522 sends a request to the order provisioning module 3524 to allocate resources and configure those resources needed to satisfy the subscription order. The order provisioning module 3524 enables allocation of resources for the services ordered by the customer. Order provisioning module 3524 provides a level of abstraction between the cloud services provided by cloud infrastructure system 3500 and the physical implementation layer for provisioning resources for providing the requested services. Thus, order orchestration module 3522 may be isolated from implementation details, such as whether services and resources are actually provisioned in real-time or pre-provisioned and only allocated/specified upon request.

At operation 3544, once the services and resources are provisioned, the order provisioning module 3524 of the cloud infrastructure system 3502 may send a notification of the services provided to the customer on the client devices 3504, 3506, and/or 3508.

At operation 3546, the customer's subscription orders may be managed and tracked by the order management and monitoring module 3526. In some instances, the order management and monitoring module 3526 may be configured to collect usage statistics for the services in the subscription order, such as the amount of memory used, the amount of data transferred, the number of users, and system power-on and system power-off times.

In certain aspects, the cloud infrastructure system 3500 can include an identity management module 3528. Identity management module 3528 may be configured to provide identity services, such as access management and authorization services in cloud infrastructure system 3500. In some aspects, the identity management module 3528 may control information about customers who wish to utilize services provided by the cloud infrastructure system 3502. Such information may include information that authenticates the identity of such clients and information that describes what actions those clients are authorized to do with respect to various system resources (e.g., files, directories, applications, communication ports, memory segments, etc.). The identity management module 3528 may also include management of descriptive information about each customer and how and by whom the descriptive information may be accessed and modified.

FIG. 36 illustrates an exemplary computer system 3600 in which aspects of the invention may be implemented. System 3600 may be used to implement any of the computer systems described above. As shown, computer system 3600 includes a processing unit 3604 that communicates with a number of peripheral subsystems via a bus subsystem 3602. These peripheral subsystems may include a processing acceleration unit 3606, an I/O subsystem 3608, a storage subsystem 3618, and a communication subsystem 3624. Storage subsystem 3618 includes tangible computer readable storage media 3622 and system memory 3610.

Bus subsystem 3602 provides a mechanism for allowing the various components and subsystems of computer system 3600 to communicate with one another as intended. Although bus subsystem 3602 is shown schematically as a single bus, alternative aspects of the bus subsystem may utilize multiple buses. The bus subsystem 3602 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. For example, such architectures can include an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus that can be implemented as a Mezzanine bus manufactured in accordance with the IEEE P3086.1 standard.

Processing unit 3604, which may be implemented as one or more integrated circuits (e.g., a conventional microprocessor or microcontroller), controls the operation of computer system 3600. One or more processors may be included in the processing unit 3604. These processors may include single-core processors or multi-core processors. In certain aspects, the processing unit 3604 may be implemented as one or more independent processing units 3632 and/or 3634, each including a single-core processor or a multi-core processor. In other aspects, processing unit 3604 may also be implemented as a quad-core processing unit formed by integrating two dual-core processors into a single chip.

In various aspects, the processing unit 3604 may execute various programs in response to program code and may maintain multiple concurrently executing programs or processes. At any given time, some or all of the program code to be executed may reside in processor(s) 3604 and/or storage subsystem 3618. The processor(s) 3604 may provide the various functions described above through appropriate programming. The computer system 3600 may additionally include a processing acceleration unit 3606, which may include a Digital Signal Processor (DSP), special purpose processor, or the like.

The I/O subsystem 3608 may include user interface input devices and user interface output devices. User interface input devices may include a keyboard, a pointing device such as a mouse or trackball, a touchpad or touch screen incorporated into the display, a scroll wheel, a click wheel, a dial, buttons, switches, a keypad, an audio input device with voice command recognition system, a microphone, and other types of input devices. The user interface input device may comprise, for example, a motion sensing and/or gesture recognition device, such as Microsoft Windows

Motion sensor that enables a user to control, for example, microsoft Windows through a natural user interface using gestures and spoken commands

360 game controller, etcThe device and interacts with it. The user interface input device may also include an eye gesture recognition device, such as to detect eye activity from the user (e.g., 'blink' while taking a picture and/or making a menu selection) and transform eye gestures into an input device (e.g., google)

) Input of (2) Google

A blink detector. Additionally, the user interface input device may include a device that enables a user to interact with a voice recognition system (e.g.,

navigator) an interactive voice recognition sensing device.

User interface input devices may also include, but are not limited to, three-dimensional (3D) mice, joysticks or pointing sticks, game pads and graphic tablets, and audio/visual devices such as speakers, digital cameras, digital video cameras, portable media players, web cameras, image scanners, fingerprint scanners, barcode readers, 3D scanners, 3D printers, laser rangefinders, and eye gaze tracking devices. Additionally, the user interface input device may include, for example, a medical imaging input device such as a computed tomography, magnetic resonance imaging, positron emission tomography, medical ultrasound examination device. The user interface input devices may also include, for example, audio input devices such as MIDI keyboards, digital musical instruments, etc.

The user interface output devices may include a display subsystem, indicator lights, or a non-visual display such as an audio output device. The display subsystem may be a Cathode Ray Tube (CRT), a flat panel device (e.g., a flat panel device using a Liquid Crystal Display (LCD) or a plasma display), a projection device, a touch screen, etc. In general, use of the term "output device" is intended to include all possible types of devices and mechanisms for outputting information from computer system 3600 to a user or other computer. For example, user interface output devices may include, but are not limited to, various display devices that visually convey text, graphics, and audio/video information, such as monitors, printers, speakers, headphones, car navigation systems, plotters, voice output devices, and modems.

Computer system 3600 may include a storage subsystem 3618 that includes software elements shown as being currently located within system memory 3610. System memory 3610 may store program instructions that are loadable and executable on processing unit 3604 and data generated during the execution of these programs.

Depending on the configuration and type of computer system 3600, system memory 3610 may be volatile (such as Random Access Memory (RAM)) and/or nonvolatile (such as Read Only Memory (ROM), flash memory, etc.). The RAM typically contains data and/or program modules that are immediately accessible to and/or presently being operated on and executed by the processing unit 3604. In some embodiments, system memory 3610 may include a variety of different types of memory, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). In some embodiments, a basic input/output system (BIOS), containing the basic routines that help to transfer information between elements within computer system 3600 during start-up, may be stored in ROM. By way of example, and not limitation, system memory 3610 illustrates application programs 3612 (which may include client application programs, a Web browser, middle tier application programs, a relational database management system (RDBMS), and the like), program data 3614, and an operating system 3616. By way of example, operating system 3616 may include various versions of Microsoft Windows

Apple

And/or Linux operating system, various commercially available

Or UNIX-like operating systems (including but not limited to)Not limited to GNU/Linux operating System, google

OS, etc.) and/or compounds such as iOS,

A telephone,

OS、

10OS and

an OS operating system, etc.

Storage subsystem 3618 may also provide a tangible computer readable storage medium for storing the basic programming and data constructs that provide the functionality of some aspects. Software (programs, code modules, instructions) that when executed by a processor provide the functionality described above may be stored in storage subsystem 3618. These software modules or instructions may be executed by the processing unit 3604. Storage subsystem 3618 may also provide a repository for storing data used in accordance with the present invention.

Storage subsystem 3600 may also include a computer-readable storage media reader 3620 that may be further connected to a computer-readable storage media 3622. Computer-readable storage media 3622, together with, and optionally in combination with, system memory 3610, may represent remote, local, fixed, and/or removable storage devices and storage media for temporarily and/or more permanently containing, storing, transmitting, and retrieving computer-readable information.

Computer-readable storage media 3622 containing the code or portions of code may also include any suitable media known or used in the art, including storage media and communication media such as, but not limited to, volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage and/or transmission of information. The media may include tangible, non-transitory computer-readable storage media, such as RAM, ROM, electrically Erasable Programmable ROM (EEPROM), flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other tangible computer-readable media. When specified, the media can also include non-tangible transitory computer-readable media, such as data signals, data transmissions, or any other medium that can be used to transmit desired information and that can be accessed by the computing system 3600.

By way of example, computer-readable storage media 3622 may include a hard disk drive that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive that reads from or writes to a removable, nonvolatile magnetic disk, and removable, nonvolatile optical disks such as CD ROMs, DVDs, and

optical disc or other optical media). Computer-readable storage media 3622 may include, but is not limited to

Drives, flash memory cards, universal Serial Bus (USB) flash memory drives, secure Digital (SD) cards, DVD disks, digital video tapes, and the like. The computer-readable storage medium 3622 may also include non-volatile memory based SSDs such as flash memory based Solid State Drives (SSDs), enterprise level flash memory drives, solid state ROMs, etc., volatile memory based SSDs such as solid state RAM, dynamic RAM, static RAM, etc., DRAM based SSDs, magnetoresistive RAM (MRAM) SSDs, and hybrid SSDs that use a combination of DRAM and flash memory based SSDs. The disk drives and their associated computer-readable media can provide non-volatile storage of computer-readable instructions, data structures, program modules, and other data for computer system 3600.

Communication subsystem 3624 provides an interface to other computer systems and networks. The communication subsystem 3624 serves as an interface for receiving data from other systems and transmitting data from the computer system 3600 to the other systems. For example, communication subsystem 3624 may enable computer system 3600 to connect to one or more devices via the internet. In some aspects, the communication subsystem 3624 may include Radio Frequency (RF) transceiver components (e.g., using cellular telephone technology, advanced data network technologies such as 3G, 4G, or EDGE (enhanced data rates for Global evolution), wiFi (IEEE 802.28 family standards or other mobile communication technologies, or any combination thereof), global Positioning System (GPS) receiver components, and/or other components for accessing wireless voice and/or data networks.

In some aspects, communication subsystem 3624 may also receive incoming communications in the form of structured and/or unstructured data feeds 3626, event streams 3628, event updates 3630, and the like, on behalf of one or more users who may use computer system 3600.

For example, the communication subsystem 3624 may be configured to receive unstructured data feeds 3626 (e.g., such as in real-time) from users of social media networks and/or other communication services

Feeding,

Updates, web feeds, such as Rich Site Summary (RSS) feeds, and/or real-time updates from one or more third party information resources.

Additionally, the communication subsystem 3624 may also be configured to receive data in the form of a continuous data stream that may include an event stream 3628 of real-time events (continuous or unbounded in nature, which may not have an explicit ending) and/or event updates 3630. Examples of applications that generate continuous data may include, for example, sensor data applications, financial tickers, network performance measurement tools (e.g., network monitoring and traffic management applications), clickstream analysis tools, automotive traffic monitoring, and so forth.

The communication subsystem 3624 may also be configured to output structured and/or unstructured data feeds 3626, event streams 3628, event updates 3630, and the like to one or more databases that may be in communication with one or more streaming data source computers coupled to the computer system 3600.

Computer system 3600 may be one of various types, including hand-portable devices (e.g.,

a cellular phone,

Computing tablet, PDA), wearable device (e.g., google)

Head mounted display), a PC, a workstation, a mainframe, a kiosk, a server rack, or any other data processing system.

Due to the ever-changing nature of computers and networks, the description of computer system 3600 depicted in the figures is intended only as a specific example. Many other configurations are possible with more or fewer components than the system depicted in the figures. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, firmware, software (including applets), or a combination. Further, connections to other computing devices, such as network input/output devices, may be employed. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will appreciate other ways and/or methods to implement the various aspects.

In the foregoing specification, aspects of the invention have been described with reference to specific aspects thereof, but those skilled in the art will recognize that the invention is not limited thereto. Various features and aspects of the above-described invention may be used alone or in combination. Further, aspects may be utilized in any number of environments and application environments other than those described herein without departing from the broader spirit and scope of the specification. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A computer-implemented method for managing conversations using one or more conversation utterance trees, the method comprising:

receiving a request from a user device, the request including speech of a conversation between two entities;

generating a conversation instance based at least in part on merging a plurality of utterances previously provided by either of the two entities, the plurality of utterances including the requested utterance;

identifying a set of candidate responses for the requested utterance from a corpus of candidate responses;

generating, for a candidate response of the set of candidate responses, a conversational utterance tree for the conversational instance and the candidate response, the conversational utterance tree including nodes corresponding to base utterance units that represent the plurality of utterances and text snippets of the candidate response, at least one non-terminating node of the nodes in the conversational utterance tree representing a lexical relationship between two base utterance units, and each terminating node of the nodes of the conversational utterance tree being associated with a base utterance unit, the conversational utterance tree including at least one node representing a conversational-specific lexical relationship between two utterances of the conversational instance;

classifying the conversational utterance trees of the candidate responses using a first machine learning model, the first machine learning model previously trained using supervised learning techniques and a training data set, the training data set including a plurality of conversational utterance trees previously labeled as valid or invalid; and

providing the candidate response to respond to the request based at least in part on classifying a dialog utterance tree of the candidate response.

2. The computer-implemented method of claim 1, wherein identifying the set of candidate responses from a candidate response corpus further comprises:

determining, for the utterance of the request, a first communication utterance tree including a question root node;

wherein the second communication utterance tree includes an answer root node;

in response to identifying that the question root node and the answer root node are the same, merging the first communication utterance tree and the second communication utterance tree to form a merged communication utterance tree;

calculating a level of complementarity between the first alternating utterance tree and the second alternating utterance tree by providing the merged alternating utterance tree to a second machine learning model, the second machine learning model previously trained to determine a level of complementarity of subtrees of both alternating utterance trees; and

identifying the requested utterance and the candidate response as complementary in response to determining that the level of complementarity is above a threshold.

3.The computer-implemented method of claim 1, wherein classifying the conversational utterance tree comprises classifying the conversational utterance tree of the candidate response as valid or invalid, wherein a valid classification indicates that an appropriate lexicography flow between utterances corresponding to the conversational utterance tree is preserved, and wherein an invalid classification indication corrupts an appropriate lexicography flow between utterances corresponding to the conversational utterance tree.

4. The computer-implemented method of claim 1, further comprising:

generating the training dataset for the first machine learning model based at least in part on generating a plurality of conversation instances from a corpus of documents, wherein generating a conversation instance from a document further comprises:

splitting the input text of the document into a set of text segments;

establishing a communication utterance tree for a text passage in the group of text passages;

identifying a set of adjunct base utterance units of an alternating utterance tree of the text passage;

selecting an entity or attribute from the adjunct base utterance unit;

generating a query from the entity or the attribute selected from the adjunct base utterance unit;

executing the query against a knowledge base;

generating a question corresponding to the adjunct base utterance unit based at least in part on one or more search results obtained by executing the query;

updating the communication-utterance tree based at least in part on inserting the question as a new node, the new node inserted based at least in part on the adjunct base-utterance unit; and

generating the conversation instance using the updated communication utterance tree.

5. The computer-implemented method of claim 4, further comprising:

generating, based at least in part on the conversation instance, a second conversation utterance tree that includes second nodes that correspond to second base utterance units that represent a second text segment of the conversation instance, each of the non-terminating nodes in the second conversation utterance tree representing a respective lexical relationship between two base utterance units and each of the terminating nodes in the nodes of the second conversation utterance tree being associated with a base utterance unit, the second conversation utterance tree including at least one conversation-specific lexical relationship between two utterances of the second conversation utterance tree;

associating the second tree of spoken utterances with a tag indicating that the second tree of spoken utterances is valid; and

storing the second dialog utterance tree and the labels as part of the training dataset used to train the first machine learning model.

6. The computer-implemented method of claim 1, wherein generating the conversational utterance tree further comprises:

generating a utterance tree for the candidate response, the utterance tree including a set of nodes, each non-terminating node in the set of nodes in the utterance tree representing a corresponding lexical relationship between two basic utterance units, and each terminating node in the set of nodes of the utterance tree being associated with a particular basic utterance unit;

identifying a lexical association of a set-up or connection type in the utterance tree, wherein the lexical association relates to a first base utterance unit and a second base utterance unit, and wherein the first base utterance unit and the second base utterance unit form a reference sentence;

identifying an abstract meaning representation of a template based at least in part on identifying one or more common entities between the utterance tree and the abstract meaning representation of the template;

identifying semantic associations corresponding to the thesaurus associations, wherein the semantic associations correspond to words of the template; and

replacing the lexical association in the utterance tree with an updated lexical association corresponding to the semantic association.

7. The computer-implemented method of claim 1, wherein providing the candidate response as part of the conversation and responding to the request is further based at least in part on determining that the candidate response is topically relevant to speech of the request.

8. A non-transitory computer-readable medium storing computer-executable program instructions for managing conversations with one or more conversation utterance trees, the instructions, when executed, causing a processor to perform operations comprising:

9. The non-transitory computer-readable medium of claim 8, wherein identifying the set of candidate responses from the candidate response corpus comprises operations for:

determining a second communication speech tree for the candidate response, wherein the second communication speech tree comprises an answer root node;

in response to recognizing that the question root node and the answer root node are the same, merging the first communication utterance tree and the second communication utterance tree to form a merged communication utterance tree;

calculating a level of complementarity between the first alternating utterance tree and the second alternating utterance tree by providing the merged alternating utterance tree to a second machine learning model, the second machine learning model previously trained to determine a level of complementarity of subtrees of the two alternating utterance trees; and

10. The non-transitory computer-readable medium of claim 8, wherein classifying the conversational utterance tree comprises classifying the conversational utterance tree of the candidate response as valid or invalid, wherein a valid classification indicates that an appropriate lexicography flow between utterances corresponding to the conversational utterance tree is preserved, and wherein an invalid classification indicates that an appropriate lexicography flow between utterances corresponding to the conversational utterance tree is disrupted.

11. The non-transitory computer-readable medium of claim 8, wherein the operations further comprise:

splitting input text of the document into a group of text segments;

selecting an entity or attribute from the adjunct base utterance unit;

executing the query against a knowledge base;

updating the communication utterance tree based at least in part on inserting the question as a new node, the new node inserted based at least in part on the adjunct base utterance unit; and

12. The non-transitory computer-readable medium of claim 11, wherein the operations further comprise:

generating a second conversational utterance tree based at least in part on the generated conversation instance, the second conversational utterance tree including a second node corresponding to a second base utterance unit that represents a second text segment of the conversation instance, each non-terminating node of the nodes in the second conversational utterance tree representing a lexical relationship between two base utterance units and each terminating node of the nodes of the second conversational utterance tree being associated with a base utterance unit, the second conversational utterance tree including at least one conversation-specific lexical relationship between two speech segments of the second conversational utterance tree;

associating the second conversation utterance tree with a label that indicates that the second conversation utterance tree is valid; and

13. The non-transitory computer-readable medium of claim 8, wherein generating the conversational utterance tree comprises further operations comprising:

generating a utterance tree for the candidate response, the utterance tree including a set of nodes, each non-terminating node in the set of nodes in the utterance tree representing a respective lexical relationship between two basic utterance units, and each terminating node in the set of nodes of the utterance tree being associated with a particular basic utterance unit;

replacing the modifier association in the utterance tree with an updated modifier association corresponding to the semantic association.

14. The non-transitory computer-readable medium of claim 8, wherein providing the candidate response as part of the conversation and responding to the request is further based at least in part on determining that the candidate response is topically relevant to speech of the request.

15. A computing device, comprising:

a non-transitory computer readable medium storing computer executable program instructions for managing conversations with one or more conversation utterance trees; and

a processor communicatively coupled to the non-transitory computer-readable medium to execute the computer-executable program instructions, wherein execution of the computer-executable program instructions configures the computing device to perform operations comprising:

receiving a request from a user device, the request including speech between two entities;

generating a conversation instance based at least in part on merging a plurality of utterances previously provided by either of the two entities, the plurality of utterances comprising an utterance of the request;

providing the candidate response as part of the dialog and responding to the request based at least in part on classifying a dialog utterance tree for the candidate response.

16. The computing device of claim 15, wherein to identify the set of candidate responses from the candidate response corpus comprises operations to:

determining, for a utterance of the request, a first communication utterance tree including a question root node;

17. The computing device of claim 15, wherein classifying the conversational utterance tree comprises classifying the conversational utterance tree of the candidate response as valid or invalid, wherein a valid classification indicates that a proper lexicography flow between utterances corresponding to the conversational utterance tree is preserved, and wherein an invalid classification indicates that a proper lexicography flow between utterances corresponding to the conversational utterance tree is disrupted.

18. The computing device of claim 15, wherein the operations further comprise:

generating the training data set for the first machine learning model based at least in part on generating a plurality of conversation instances from a corpus of documents, wherein generating conversation instances from documents further comprises:

splitting the input text of the document into a set of text segments;

selecting an entity or attribute from the adjunct base utterance unit;

executing the query against a knowledge base;

19. The computing device of claim 18, wherein the operations further comprise:

20. The computing device of claim 15, wherein to generate the conversational utterance tree further comprises to:

generating a utterance tree for the candidate response, the utterance tree including a set of nodes, each non-terminating node in the set of nodes in the utterance tree representing a respective lexical relationship between two base utterance units, and each terminating node in the set of nodes of the utterance tree being associated with a particular base utterance unit;

identifying a semantic association corresponding to the thesaurus association, wherein the semantic association corresponds to a word of the template; and