WO2018174815A1

WO2018174815A1 - Method and apparatus for semantic coherence analysis of texts

Info

Publication number: WO2018174815A1
Application number: PCT/SG2017/050153
Authority: WO
Inventors: Shangfeng Hu; Jung Jae Kim; Rajaraman Kanagasabai
Original assignee: Agency For Science, Technology And Research
Priority date: 2017-03-24
Filing date: 2017-03-24
Publication date: 2018-09-27

Abstract

A method, a computer-readable medium, and an apparatus for constructing machine learning based models for pronoun resolution are provided. The apparatus may identify a set of subjects in a first sentence and a pronoun in a second sentence. The apparatus may generate a set of tuples. Each tuple may include a subject of the set of subjects, the pronoun, a first subgraph of a first dependency graph corresponding to the first sentence, and a second subgraph of a second dependency graph corresponding to the second sentence. For each tuple, the apparatus may identify a trained machine learning based model corresponding to the tuple. For each tuple, the apparatus may determine a validation score of the tuple using the trained machine learning based model corresponding to the tuple. The apparatus may determine a referent of the pronoun based on the validation scores.

Description

METHOD AND APPARATUS FOR SEMANTIC COHERENCE

ANALYSIS OF TEXTS

TECHNICAL FIELD

[0001] Various aspects of this disclosure generally relate to machine learning, and more particularly, to the training and applications of machine learning based models for semantic coherence analysis of texts.

BACKGROUND

[0002] Machine learning is a sub field of computer science that explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms make data driven predictions or decisions through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms is infeasible. Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that lend themselves to prediction. These analytical models allow researchers, data scientists, engineers, and analysts to produce reliable, repeatable decisions and results and uncover hidden insights through learning from historical relationships and trends in the data.

[0003] It may be desirable to use machine learning to discover new and useful insights from large volume of free text. However, free text being unstructured and semantically ambiguous implies that standard data analytics methods do not apply. Therefore, it may be necessary to go beyond just keywords to develop advanced text analytics methods.

[0004] Semantic text analytics aim to analyze text by making sense of context, meaning, and domain knowledge. Typically, semantic text analytics use natural language processing (NLP), ontologies, and/or machine learning (e.g., deep learning) approaches. Various applications of semantic text analytics include information extraction, questions and answers (Q&A) systems, machine translation, etc. [0005] Many text analytics problems involve the process of determining whether two text structures are coherent to each other when connected via a common term (pivot). Such process may be referred to as semantic text coherence analysis. Semantic text coherence analysis may be used in areas such as text summarization, information extraction, Q&A systems, dialogue systems, machine translation, and so on.

[0006] Supervised learning is the machine learning task of inferring a function from labeled training data. It is costly and time consuming to annotate training data sets manually. Distant-supervised machine learning is the machine learning task of generating the labeled training data required for supervised learning by automatically or semi- automatically labeling "unlabeled" data with positive samples from e.g. a relational database and a knowledge base. For example, supervised learning for information extraction (e.g. event extraction) requires texts in which events (e.g. (acquiring company, business acquisition keyword, company acquired)) are labelled, while distant- supervised learning assumes an event database and an unlabelled text corpus, automatically or semi-automatically annotates the events of the database on the texts of the corpus and uses the labelled data as the training data of supervised learning. Note that the cost of automatically or semi-automatically labelling unlabelled text with known positive samples is usually much cheaper than that of manually labelling texts. Therefore, it may be desirable to use distant-supervised learning to construct machine learning based models for semantic text coherence analysis.

[0007] Because text is inherently ambiguous, and the same meaning may be conveyed by different text structures, it may be desirable to use semantic text coherences analysis to identify text structures that convey the same meaning (e.g., to resolve pronoun with precedent referents). It may also be desirable to use semantic text coherences analysis to populate a knowledge base with unknown knowledge (e.g., ontology learning).

SUMMARY

[0008] The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

[0009] In an aspect of the disclosure, a method, a computer-readable medium, and an apparatus for pronoun resolution are provided. The apparatus may identify a set of subjects in a first sentence and a pronoun in a second sentence. The apparatus may generate a set of tuples based on the first sentence and the second sentence. Each tuple of the set of tuples may include a subject of the set of subjects, the pronoun, a first subgraph of a first dependency graph corresponding to the first sentence, and a second subgraph of a second dependency graph corresponding to the second sentence. For each tuple of the set of tuples, the apparatus may identify a trained machine learning based model corresponding to the tuple. For each tuple of the set of tuples, the apparatus may determine a validation score of the tuple using the trained machine learning based model corresponding to the tuple. The apparatus may determine a referent of the pronoun based on the validation scores.

[0010] In another aspect of the disclosure, a method, a computer-readable medium, and an apparatus for knowledge base completion are provided. The apparatus may identify a plurality of heads and tails, and a plurality of relations in a knowledge base including a plurality of triples. Each triple of the plurality of triples may include a head, a tail, and a relation between the head and the tail. The apparatus may generate a candidate triple that includes a candidate head, a candidate tail, and a candidate relation. The candidate head and the candidate tail may be selected from the plurality of heads and tails. The candidate relation may be selected from the plurality of relations. The candidate triple may be outside of the plurality of triples, thus not part of the knowledge base. The apparatus may estimate the validity of a sentence formed based on the candidate triple. The apparatus may add the candidate triple into the knowledge base when the sentence formed based on the candidate triple is validated.

[0011] To the accomplishment of the foregoing and related ends, the one or more aspects include the features hereinafter fully described and particularly pointed out in the claims. The following description and the annexed drawings set forth in detail certain illustrative features of the one or more aspects. These features are indicative, however, of but a few of the various ways in which the principles of various aspects may be employed, and this description is intended to include all such aspects and their equivalents.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] FIG. 1 is a diagram illustrating an example of two dependency graphs.

[0013] FIG. 2 is a diagram illustrating an example of two respective subgraphs of the dependency graphs described above in FIG. 1.

[0014] FIG. 3 is a diagram illustrating an example of two respective subgraph patterns of the subgraphs described above in FIG. 2.

[0015] FIG. 4 is a diagram illustrating an example of a joined pattern based on the subgraph patterns described above in FIG. 3.

[0016] FIG. 5 is a flowchart of a method of constructing machine learning based models for semantic text coherence analysis.

[0017] FIG. 6 is a diagram illustrating an example of a dependency graph corresponding to a candidate sentence.

[0018] FIG. 7 is a diagram illustrating an example of the dependency graph described above in FIG. 6 being split into two subgraphs.

[0019] FIG. 8 is a diagram illustrating an example of two respective subgraphs of the subgraphs described above in FIG. 7.

[0020] FIG. 9 is a diagram illustrating an example of a joined pattern based on the subgraph patterns described above in FIG. 8.

[0021] FIG. 10 is a flowchart of a method of validating a syntactic structure.

[0022] FIG. 11 is a flowchart of a method of pronoun resolution.

[0023] FIG. 12 is a flowchart of a method of knowledge base completion.

[0024] FIG. 13 is a conceptual data flow diagram illustrating the data flow between different means/components in an exemplary apparatus.

[0025] FIG. 14 depicts a schematic drawing of an exemplary computer system. DETAILED DESCRIPTION

[0026] The detailed description set forth below in connection with the appended drawings is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well known structures and components are shown in block diagram form in order to avoid obscuring such concepts.

[0027] Several aspects of computing systems for machine learning will now be presented with reference to various apparatus and methods. The apparatus and methods will be described in the following detailed description and illustrated in the accompanying drawings by various blocks, components, circuits, processes, algorithms, etc. (collectively referred to as "elements"). The elements may be implemented using electronic hardware, computer software, or any combination thereof. Whether such elements are implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system.

[0028] By way of example, an element, or any portion of an element, or any combination of elements may be implemented as a "processing system" that includes one or more processors. Examples of processors include microprocessors, microcontrollers, graphics processing units (GPUs), central processing units (CPUs), application processors, digital signal processors (DSPs), reduced instruction set computing (RISC) processors, systems on a chip (SoC), baseband processors, field programmable gate arrays (FPGAs), programmable logic devices (PLDs), state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functionality described throughout this disclosure. One or more processors in the processing system may execute software. Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software components, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise.

[0029] Accordingly, in one or more example embodiments, the functions described may be implemented in hardware, software, or any combination thereof. If implemented in software, the functions may be stored on or encoded as one or more instructions or code on a computer-readable medium. Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media may include a random-access memory (RAM), a read-only memory (ROM), an electrically erasable programmable ROM (EEPROM), optical disk storage, magnetic disk storage, other magnetic storage devices, combinations of the aforementioned types of computer- readable media, or any other medium that can be used to store computer executable code in the form of instructions or data structures that can be accessed by a computer.

[0030] One aspect of this disclosure is a distant-supervised method of constructing machine learning based models over coherent syntactic structures of valid sentences. In another aspect, the structural coherence of automatically generated sentences is estimated using the constructed machine learning based models in order to support applications that test the validity of candidate sentences and syntactic structures. Such applications may include pronoun resolution and knowledge base completion. In contrast to supervised methods, the method of one embodiment works in a distant-supervised manner, such that it requires only positive samples as well as unlabeled sentences, but not negative samples and labelling on sentences. As a result, the costly manual labelling on sentences and the collection of negative samples may be avoided. Also, the collection of positive samples may be semi-automated, which will be further described below.

[0031] The task of pronoun resolution (or pronoun disambiguation) is to locate the referent (or antecedent) of a given pronoun in its earlier context. In terms of structural coherence, if a pronoun and a candidate referent have coherent syntactic contexts, the two may match and the candidate referent may be identified as a correct referent of the pronoun.

[0032] An existing knowledge base (KB) may consist of triples (e.g. Obama-is_a- US_president). Knowledge base completion is the task of adding more relevant triples to the KB. Specifically, given a candidate triple, knowledge base completion determines whether the candidate is relevant to the KB or not. Two assumptions are made as follows: 1) a candidate triple is generated by replacing one of the three elements in a known triple (i.e. a triple already registered to the KB) with an unknown element (e.g. Clinton is an unknown element to the triple of Obama-is_a-US_president), where the name of the unknown element is designated as u; 2) each triple is assumed to have a textual description in the form of a sentence (e.g. "Obama is a US president"), where the sentence is designated as s. Based on the two assumptions, the task of knowledge base completion may be redefined as determining the validity of the sentence s in which the unknown element name u replaces the corresponding element. In terms of structural coherence, if the word u and the rest of the sentence s together form into a coherent syntactic structure, a conclusion that the candidate triple is relevant to the KB may be drawn.

[0033] These applications are fundamental and have great commercial impacts. Pronoun resolution is a fundamental task of natural language processing, which has many applications in discourse analysis including question answering and information extraction. Knowledge bases have formalized answers to questions and are thus computationally efficient for question answering, while answer-bearing texts (e.g. Web pages) are unstructured, thus computationally inefficient. The completion (or population) of knowledge bases can enhance efficient methods of KB-based question answering.

[0034] In one embodiment, the positive samples of pronoun resolution required for constructing machine learning based models for structural coherence measurement may be constructed by mining unlabeled, unstructured texts using a nominal coreference resolution system that correlates nouns and noun phrases (e.g. the president - US President). Note that, thanks to the shared or semantically related nouns between anaphoric expression and its antecedent (e.g. president-president), nominal coreference resolution is usually easier, thus showing higher accuracy, than pronoun resolution. In one embodiment, the positive samples of knowledge base completion may be constructed by manually giving a textual description not to each triple (e.g. Obama-is a- US_president), but to each unique relationship type (e.g. is a), which can be used to generate a textual description for each triple (e.g. "Obama is a US president"). Note that e.g. the DBpedia KB 2016-04 release has over 9 billion triples, but less than 3,000 relationship (property) types.

[0035] In one embodiment, machine learning based models may be constructed for structural coherence measurement. The construction process may work in an offline manner (i.e., done before the applications of the models). In one embodiment, the construction process may take as inputs a text corpus and a nominal coreference resolution system and generates machine learning based models for structural coherence measurement.

[0036] FIGS. 1-4 describe an example of constructing machine learning based models for structural coherence measurement. In one embodiment, for each noun m in the text corpus, a list of coreferences of m may be located within a given local window of m (for example, three sentences before or after the noun m). The list of coreferences of m may be designated as coreferences(m). For example, the text corpus may include two sentences. The first sentence is "A man looks at a cat." The second sentence is "The cat is playing a ball." When m is 'cat' in the second sentence, coreferences(m) has one element of 'cat' in the first sentence.

[0037] For each coreference rm of m, another noun m₂, which is different from m and mi in terms of their lemmas, may be randomly selected. The mention pair {m, mi} may be labeled as positive because mi is a coreference of m, and the mention pair {m, m₂} may be labeled as negative because m₂ is not a coreference of m. For example, the mention pair {'cat' in the second sentence, 'cat' in the first sentence} may be labeled as positive, and the mention pair {'cat' in the second sentence, 'man' in the first sentence} may be labeled as negative.

[0038] Given a mention pair {m, m'} with a label 1 (positive or negative), the corresponding sentences s and s' of m and m' may be found, respectively. The sentences s and s' may be parsed into dependency graphs g and g', respectively. All the subgraphs of g that include m, designated as {gi, g₂... g_m} eG, may be identified. Similarly, all the subgraphs of g' that include m', designated as {g'i, g'₂... g'n}eG' may be identified. For each pair of subgraphs gi G G and g'j E G', a tuple of (m, m', 1, gi, g'j) may be generated. All these tuples may be collected as the dataset T. [0039] FIG. 1 is a diagram illustrating an example of two dependency graphs 100 and 120. In this example, the dependency graphs 100 and 120 are g and g' corresponding to the mention pair {m='cat' in the second sentence, m'='cat' in the first sentence} . Essentially, the first sentence is parsed into the dependency graph 100, and the second sentence is parsed into the dependency graph 120.

[0040] FIG. 2 is a diagram illustrating an example of two respective subgraphs 200 and 220 of the dependency graphs 100 and 120 described above in FIG. 1. As illustrated, the subgraphs 200 and 220 are examples of gi and g'j, respectively. Thus, an example tuple may be ('cat' in the second sentence, 'cat' in the first sentence, positive, subgraph 200, subgraph 220).

[0041] Given a tuple (m, m', 1, gi, g'j), the two words of m and m' in gi and g'j may be replaced with a first label (e.g., NodeRef), which means that m and m' coreference each other, and all the other words may be replace with a second label (e.g., Node). Consequently, two new subgraphs called "subgraph patterns" may be generated. The subgraph patterns may be designated as pattern(gi, m) and pattern(g'j, m').

[0042] FIG. 3 is a diagram illustrating an example of two respective subgraph patterns 300 and 320 of the subgraphs 200 and 220 described above in FIG. 2. As illustrated, the subgraph pattern 300 may be designated as pattern (gi, cat), and the subgraph pattern 320 may be designated as pattern (g'j, cat). The subgraph patterns 300 and 320 are examples of subgraph patterns of gi and g'j, respectively.

[0043] The two subgraph patterns 300 and 320 may be combined to form a joined pattern. For example, the two odeRef of the subgraph patterns 300 and 320 may be merged into a single node. The graph that combines pattern(gi, m) and pattern(g'j, m') may be referred to as patternj₀in(gi, m, g'j, m'). All such patternjoin may be collected. The collection of patternjoin may be referred to as {pi, p₂... pk} and the subset of tuples that correspond to pi may be designated as Ti. As a result, the whole tuple collection T is subcategorized into {Ti, T₂... Τκ} . Given a tuple subset Tk, the dependency structures of subgraph patterns (i.e., patternj₀in(gi, m, g'j, m')) in all the tuples of the subset Tk are identical. [0044] FIG. 4 is a diagram illustrating an example of a joined pattern 400 based on the subgraph patterns 300 and 320 described above in FIG. 3. As illustrated, the joined pattern 400 may be designated as patternj₀in(gi, m, g'j, m').

[0045] For each tuple (m, m', 1, gi, g'j) e Tk, the words in the subgraph gi except the word m may be collected into a word sequence following the original sequence of the words in the sentence, designated as (wn, Win). Similarly, the word sequence of g'j except m' may be collected into a word sequence following the original sequence of the words in the sentence, designated as (wji, Wj_n')- The two sequences may then be joined, where m is placed in the beginning of the joined sequence, like (m, wn, Win, W_ji, wjn_')- The joined sequence may be designated as qij. For example, if (wn, Win) = (man, looks) and (wji, wj_n') = (playing, ball), then qij = (cat, man, looks, playing, ball).

[0046] Given qi_j, the word sequence may be converted into a joint vector vy by replacing each word with its word vector, which may be generated by a word embedding method (e.g., word2vec), and by joining all the word vectors into a joint vector. For example, if there are 5 words in the sequence and word vectors have 200 dimensions, the joint vector of the word sequence will have 1000 dimensions.

[0047] Given a tuple subset T_k, each tuple (m, m', 1, gi, g'j) in the tuple subset may be replaced with a pair that includes the joint vector of qij (i.e., Vij) and the label 1. As a result, the tuple subset Tk may have pairs of (vij, 1).

[0048] The word sequences qij of all the tuples in each tuple subset Tk have the same length, i.e. the same number of words, designated as length(Tk). Each tuple subset Tk is assigned with a weight Wk = e^length(Tk).

[0049] In one embodiment, a machine learning based model, designated as Xk, may be trained for each tuple subset Tk. In such an embodiment, for each tuple subset Tk, the associated pairs of (vij, 1) may be provided as the training data set. For example, a joint vector of qi_j (i.e., Vij) may be provided as an input to the model and the corresponding label 1 may be provided as the expected output of the model. As a result, machine learning based models {Xi, ..., Χκ} εΧ may be constructed.

[0050] FIG. 5 is a flowchart 500 of a method of constructing machine learning based models for semantic text coherence analysis. In one embodiment, the method may perform operations described above with reference to FIGS. 1-4. The method may be performed by a computing device. At 502, the device may automatically generate a plurality of tuples based on a text corpus. For example, the plurality of tuples may be the dataset T described above, and each tuple may be the tuple of (m, m', 1, gi, g'j) described above. Each tuple of the plurality of tuples may include a first subject (e.g., 'man'), a second subject (e.g., 'cat'), a relationship (e.g., 1) between the first subject and the second subject, a first subgraph (e.g., gi) of a first dependency graph (e.g., g) corresponding to a first sentence that includes the first subject, a second subgraph (e.g., g'j) of a second dependency graph (e.g., g') corresponding to a second sentence that includes the second subject. The first sentence and the second sentence may be the same sentence or two different sentences.

[0051] In one embodiment, the second subject may be a coreference of the first subject within a local window of the first subject in the text corpus, and the relationship between the first subject and the second subject is positive. In one embodiment, the second subject may not be a coreference of the first subject in the text corpus, and the relationship between the first subject and the second subject is negative.

[0052] In one embodiment, the device may further parse the first sentence into the first dependency graph, and parse the second sentence into the second dependency graph. In one embodiment, the first subgraph may include the first subject and the second subgraph may include the second subject.

[0053] At 504, for each tuple of the plurality of tuples, the device may normalize the tuple by replacing the first subject and the second subject with a first label (e.g., NodeRef) and replacing other words within the first subgraph and the second subgraph with a second label (e.g., Node).

[0054] At 506, for each normalized tuple of the plurality of normalized tuples, the device may merge the first subgraph and the second subgraph via the first label to obtain a joined pattern (e.g., patternj₀in(gi, m, g'j, m').

[0055] At 508, the device may classify the plurality of tuples into a plurality of groups (e.g., tuple subsets {Ti, T₂, Τκ}) based on the joined pattern associated with each tuple. [0056] In one embodiment, for each tuple of the plurality of tuples, the device may generate a joined word sequence (e.g., qy) that includes the first subject, a first set of words in the first subgraph except the first subject, and a second set of words in the second subgraph except the second subject. The first subject may be placed in the beginning of the joined word sequence followed by the first set of words and the second set of words. The first set of words may follow the original sequence of words in the first sentence and the second set of words may follow the original sequence of words in the second sentence. In one embodiment, for each tuple of the plurality of tuples, the device may further convert the joined word sequence into a joint word vector (e.g., vy).

[0057] At 510, for each group of the plurality of groups, the device may train a machine learning based model (e.g., Xk) based on tuples classified into the group. In one embodiment, to train the machine learning based model for each group, for each tuple classified into the group, the device may train the machine learning based model with the joint word vector as input and the relationship between the first subject and the second subject as the expected output.

[0058] In one embodiment, given a sentence s and a noun n in the sentence, the validity of a given sentence may be checked by using the machine learning based models (e.g., X) described above with reference to FIGS. 1-5. FIGS. 6-9 describe an example of applying the trained machine learning based models to candidate sentences or syntactic structures. In this example, the candidate sentence s is "A plaster cat is playing a ball" and the noun n is 'cat'.

[0059] In one embodiment, the sentence s may be parsed into a dependency graph g by using an off-the-shelf parser. FIG. 6 is a diagram illustrating an example of a dependency graph 600 corresponding to the candidate sentence s. As illustrated, the dependency graph 600 may be designated as g.

[0060] The dependency graph g may be split into two subgraphs g' and g" at the noun n. The node of noun n may be copied as n' included in g' and n" included in g" . FIG. 7 is a diagram illustrating an example of the dependency graph 600 described above in FIG. 6 being split into two subgraphs 700 and 720. As illustrated, the subgraphs 700 and 720 may be designated as g' and g", respectively. The node n' in subgraph 700 and the node n' ' in subgraph 720 are the word cat. [0061] In one embodiment, two sets of the subgraphs G' and G" may be collected. The set of subgraphs G' includes g' i, g'₂... g'm, which are subgraphs of g'. The set of subgraphs G" includes g" i, g"₂... g"_n, which are subgraphs of g".

[0062] FIG. 8 is a diagram illustrating an example of two respective subgraphs 800 and 820 of the subgraphs 700 and 720 described above in FIG. 7. As illustrated, the subgraphs 800 and 820 are examples of g'i and g"j, respectively. For each pair of subgraphs g'i e G' and g"j e G", a test tuple (η', n", ?, g'i, g"j) may be generated. Thus, an example tuple may be ('cat' in the subgraph 800, 'cat' in the subgraph 820, ?, the subgraph 800, the subgraph 820).

[0063] Given a test tuple (η', n", ?, g'i, g"j), the two words of n' and n" in g' i and g"_j may be replaced with a first label (e.g., NodeRef), and all the other words may be replace with a second label (e.g., Node). Consequently, two new subgraphs called "subgraph patterns" may be generated. The subgraph patterns may be designated as pattern(g' i, n') and pattern(g"j, n").

[0064] The two subgraph patterns pattern(g' i, n') and pattern(g"j, n") may be combined to form a joined pattern. For example, the two NodeRef of the subgraph patterns pattern(g'i, n') and pattern(g"j, n") may be merged into a single node. The graph that combines pattern(g'i, n') and pattern(g"j, n") may be referred to as patternj₀in(g'i, n', g"j, n").

[0065] FIG. 9 is a diagram illustrating an example of a joined pattern 900 based on the subgraph patterns 800 and 820 described above in FIG. 8. As illustrated, the joined pattern 900 may be designated as patternj₀in(g' i, n', g"j, n"). In one embodiment, the patternjoin(g'i, n', g"j, n") is generated the same way as the method described above with reference to FIGS. 3 and 4.

[0066] For a test tuple (η', n", ?, g'i, g"j), the words in the subgraph g' i except the word n' may be collected into a word sequence following the original sequence of the words in the sentence. Similarly, the word sequence of g"j except n" may be collected into a word sequence following the original sequence of the words in the sentence. The two sequences may then be joined, where n' is placed in the beginning of the joined sequence. The joined sequence may be designated as qij. [0067] Given qij, the word sequence may be converted into a joint vector vij by replacing each word with its word vector, which may be generated by a word embedding method (e.g., word2vec), and by joining all the word vectors into a joint vector. In one embodiment, the joint vector vij is generated the same way as the method described above regarding generating the joint vector Vij for constructing machine learning based models.

[0068] Given a pattern_J0in(g'i, n', g"j, n"), if there is a corresponding trained machine learning based model Xk that is trained with a tuple subset Tk (e.g., constructed as described above with reference to FIGS. 1-5), then the vector ν¾ may be used as the input of the trained machine learning based model, which generates a regression score between 0 and 1. If there is no corresponding trained machine learning based model, the pattern may be discarded.

[0069] In one embodiment, given a candidate sentence s and its tuples, for each tuple ti of the tuples, the corresponding model Xk is used to generate its regression score n. The correctness score r of the sentence s is assigned as the weighted average of n, s.t. the sum of n x Wk divided by the sum of Wk.

[0070] In summary, the whole validation process described above with reference to FIGS. 6-9 can be formalized as the following function: validation-sentence (s, n) e (0..1). By way of example, the sub-sequence of validating a test tuple can be formalized as a sub function as follows: validation-tuple (η', n", g'i, g"j)€≡ (0..1).

[0071] FIG. 10 is a flowchart 1000 of a method of validating a syntactic structure. In one embodiment, the syntactic structure may be a sentence. In one embodiment, the method may perform operations described above with reference to FIGS. 6-9. The method may be performed by a computing device. At 1002, the device may parse the syntactic structure into a dependency graph (e.g., the dependency graph 600).

[0072] At 1004, the device may split the dependency graph into a first subgraph (e.g., the subgraph 700) and a second subgraph (e.g., the subgraph 720) based on a subject (e.g., 'cat') within the syntactic structure. In one embodiment, the subject may be included in both the first subgraph and the second subgraph.

[0073] At 1006, the device may generate a plurality of tuples of the syntactic structure. Each tuple (e.g., the tuple(n', n", ?, g'i, g"j)) may be based on the subject, a subgraph of the first subgraph, a subgraph the second subgraph, and a relationship between the subject and the rest of the syntactic structure.

[0074] At 1008, the device may identify a trained machine learning based model (e.g., X_k) corresponding to each tuple. In one embodiment, to identify the trained machine learning based model, the device may normalize the tuple by replacing the subject with a first label and replacing other words within the first subgraph and the second subgraph with a second label. The device may further merge the first subgraph and the second subgraph via the first label to obtain a joined pattern. The device may then identify the trained machine learning based model based on the joined pattern.

[0075] At 1010, the device may estimate validity of the syntactic structure using the trained machine learning based models of the tuples. In one embodiment, to estimate the validity of the syntactic structure, the device may generate a word vector for the tuple, and feed the word vector as input to the trained machine learning based model. In one embodiment, to generate the word vector, the device may generate a joined word sequence that includes the subject, a first set of words in the first subgraph except the subject, and a second set of words in the second subgraph except the subject. The subject may be placed in the beginning of the joined word sequence followed by the first set of words and the second set of words. The first set of words and the second set of words may follow the original sequence of words in the sentence. The device may further convert the joined word sequence into the word vector.

[0076] In one embodiment, the device may determine the validity of the syntactic structure based on the outputs of the trained machine learning based models in response to the word vectors of the plurality of tuples of the syntactic structure. In one embodiment, to estimate the validity of the syntactic structure, the device may estimate the validity of the syntactic structure as the weighted sum of validity scores of the plurality of tuples of the syntactic structure.

[0077] In one embodiment, the machine learning based models (e.g., X) described above with reference to FIGS. 1-5 may be applied to pronoun resolution. Pronoun resolution may be treated as text coherence analysis. Thus, if the two syntactic structures of two sentences that contain a pronoun and its referent are coherent to each other, it is likely for the pronoun to refer to the referent. [0078] In pronoun resolution, it is assumed that the input has one or two sentences, where the second (or the first if only one) sentence has a pronoun p and that the task is to identify the referent (or antecedent) of the pronoun in the first sentence.

[0079] In one embodiment, all nouns and noun phrases in the first sentence, designated as N, may be located. For each ni £ N and the given pronoun p, subgraphs of the dependency structure of the first sentence such that all g'j 6 G'={g' i, g'₂, ··· g'_m} that include m is collected, and subgraphs of the dependency structure of the second sentence such that all gk E G={gi, g₂ ··· g_n} that include p is collected.

[0080] For each pair of subgraphs g'j EG' and gk EG, the validation process described above with reference to FIGS. 6-10 may be used to estimate the match between the two subgraphs by calculating validation-tuple (ni, p, g'j, gk). In one embodiment, the noun or noun phrase n; with the highest validation score for the given pronoun p may be considered as the correct referent of the pronoun p.

[0081] FIG. 1 1 is a flowchart 1 100 of a method of pronoun resolution. The method may be performed by a computing device. At 1 102, the device may identify a set of subjects (e.g., N) in a first sentence and a pronoun (e.g., p) in a second sentence. The set of subjects may include nouns and noun phrases. The first sentence and the second sentence may be the same sentence or two different sentences.

[0082] At 1 104, the device may generate a set of tuples based on the first sentence and the second sentence. Each tuple of the set of tuples may include a subject (e.g., ni) of the set of subjects, the pronoun, a first subgraph (e.g., g'j) of a first dependency graph corresponding to the first sentence, a second subgraph (e.g., gk) of a second dependency graph corresponding to the second sentence.

[0083] At 1 106, for each tuple (e.g., (n,, p, g'j, gk)) of the set of tuples, the device may identify a trained machine learning based model corresponding to the tuple. In one embodiment, to identify the trained machine learning based model corresponding to the tuple, the device may normalize the tuple by replacing the subject and the pronoun with a first label and replacing other words within the first subgraph and the second subgraph with a second label. The device may further merge the first subgraph and the second subgraph via the first label to obtain a joined pattern. The device may then identify the trained machine learning based model based on the joined pattern. [0084] At 1108, for each tuple (e.g., (ni, p, g'j, gk)) of the set of tuples, the device may determine a validation score of the tuple using the trained machine learning based model corresponding to the tuple. In one embodiment, to determine the validity score of the tuple using the trained machine learning based model corresponding to the tuple, the device may generate a word vector for the tuple, and feed the word vector as input to the trained machine learning based model. To generate the word vector for the tuple, the device may generate a joined word sequence that includes the subject, a first set of words in the first subgraph except the subject, and a second set of words in the second subgraph except the pronoun. The subject may be placed in the beginning of the joined word sequence followed by the first set of words and the second set of words. The first set of words may follow the original sequence of words in the first sentence and the second set of words may follow the original sequence of words in the second sentence. The device may further convert the joined word sequence into the word vector.

[0085] At 1110, the device may determine a referent of the pronoun based on the validation scores. In one embodiment, a noun or noun phrase contained in a tuple that has the highest validation score is determined to be the referent of the pronoun. In one embodiment, the device may determine the validation score of each of the nouns and pronouns based on outputs of the trained machine learning based models in response to the word vectors of tuples of the noun or noun phrase. The noun or noun phrase whose validation score is the highest may be determined to be the referent of the pronoun.

[0086] In one embodiment, the machine learning based models (e.g., X) described above with reference to FIGS. 1-5 may be applied to knowledge base completion. KB completion may be treated as text coherence analysis. In one embodiment, there may be two text structures: the syntactic structures of a knowledge template and its filler. If the two structures are coherent to each other, it is likely that the filler can fill in the template.

[0087] A knowledge base may consist of triples, designated as (head, relation, tail) or in short as (h, r, t). It is assumed that there are textual descriptions for each relation type r, designated as D(r), where such a description has two empty slots for head and tail.

[0088] In one embodiment, given a knowledge base KB, the set of all heads and tails from the existing triples of the KB, designated as S, may be collected. The set of all relation types of the KB, designated as R, may also be collected. If head or tail is not a noun but a noun phrase, the headword of the noun phrase may be identified and added to S. The noun phrase is not added to S. If head or tail is not a noun (e.g., verb phrase), the first noun in the head or tail may be identified and added to S. If head or tail has no noun, it may be discarded.

[0089] In one embodiment, a set of candidate triples (h, r, t) may be generated, where h e S, r £ R, t £ S, and (h, r, t) does not exist in KB. For a candidate triple (h, r, t), a textual description may be randomly selected among D(r) and replaced its two empty slots with h and t, designated as d. The set of tuples (h, r, t, d) may be referred as C.

[0090] Given a tuple (h, r, t, d) in C, the correctness of the triple may be decided by using the validation process described above with reference to FIGS. 6- 10. That is, the correctness of the triple may be determined by calculating validation-sentence (d, h).

[0091] In one embodiment, if the value (e.g., validation score) is above a given threshold (e.g. 0.5), the tuple may be considered as positive/valid. The candidate tuples (h, r, t, d) that are validated in the previous step may be collected, and the corresponding triples (h, r, t) may be added to the KB.

[0092] FIG. 12 is a flowchart 1200 of a method of knowledge base completion. The method may be performed by a computing device. At 1202, the device may identify a plurality of heads and tails (e.g., S), and a plurality of relations (e.g., R) in a knowledge base including a plurality of triples, each triple including a head, a tail, and a relation between the head and the tail.

[0093] At 1204, the device may generate a candidate triple (e.g., (h, r, t)) including a candidate head, a candidate tail, and a candidate relation. The candidate head and the candidate tail may be selected from the plurality of heads and tails. The candidate relation may be selected from the plurality of relations. The candidate triple may be outside of the plurality of triples. That is, the candidate triple is not part of the knowledge base.

[0094] At 1206, the device may estimate the validity of a sentence formed based on the candidate triple. In one embodiment, to estimate the validity of the sentence, the device may parse the sentence into a dependency graph. The device may further split the dependency graph into a first subgraph and a second subgraph at the candidate head. The candidate head of a tuple may be included in both the first subgraph and the second subgraph of the tuple. The device may further generate a set of tuples for the dependency graph. Each tuple may be based on the candidate head, a subgraph of the first subgraph, and a subgraph of the second subgraph. The device may further identify a trained machine learning based model corresponding to each tuple. The device may then estimate the validity of the sentence using the trained machine learning based models of the set of tuples for the dependency graph.

[0095] In one embodiment, to identify the trained machine learning based model, the device may normalize the tuple by replacing the candidate head with a first label and replacing other words within the first subgraph and the second subgraph with a second label. The device may further merge the first subgraph and the second subgraph via the first label to obtain a joined pattern. The device may then identify the trained machine learning based model based on the joined pattern.

[0096] In one embodiment, to estimate the validity of the sentence using the trained machine learning based model, the device may generate a word vector for the tuple, and feed the word vector as input to the trained machine learning based model. To generate the word vector, the device may generate a joined word sequence that includes the candidate head, a first set of words in the first subgraph except the candidate head, and a second set of words in the second subgraph except the candidate head. The candidate head may be placed in the beginning of the joined word sequence followed by the first set of words and the second set of words. The first set of words and the second set of words may follow the original sequence of words in the sentence. The device may further convert the joined word sequence into the word vector. In one embodiment, the device may determine the validity of the sentence based on the outputs of the trained machine learning based models in response to the word vectors of the set of tuples.

[0097] At 1208, the device may add the candidate triple into the knowledge base when the sentence is estimated to be valid. For example, the device may add the candidate triple into the knowledge base when the output of the trained machine learning based model satisfies a threshold (e.g., above 0.5).

[0098] FIG. 13 is a conceptual data flow diagram 1300 illustrating the data flow between different means/components in an exemplary apparatus 1302. The apparatus 1302 may be a computing device. The apparatus 1302 may include a training component 1304 and/or an application component 1306. [0099] The training component 1304 may receive a text corpus and construct machine learning based models in a distant-supervised manner based on the text corpus. In one configuration, the training component 1304 may perform the operations described above with reference to FIG. 5.

[00100] The application component 1306 may perform semantic coherence analysis of texts based on the trained machine learning based models, which may be constructed by the apparatus 1302 or one or more different devices. In one configuration, the application component 1306 may perform the operations described above with reference to FIG. 10, 11, or 12.

[00101] The apparatus may include additional components that perform each of the blocks of the algorithm in the aforementioned flowcharts of FIGS. 5, 10, 11, 12. As such, each block in the aforementioned flowcharts of FIGS. 5, 10, 11, 12 may be performed by a component and the apparatus may include one or more of those components. The components may be one or more hardware components specifically configured to carry out the stated processes/algorithm, implemented by a processor configured to perform the stated processes/algorithm, stored within a computer-readable medium for implementation by a processor, or some combination thereof.

[00102] The methods or functional modules of the various example embodiments as described hereinbefore may be implemented on a computer system, such as a computer system 1400 as schematically shown in FIG. 14 as an example only. The method or functional module may be implemented as software, such as a computer program being executed within the computer system 1400, and instructing the computer system 1400 to conduct the method of various example embodiments. The computer system 1400 may include a computer module 1402, input modules such as a keyboard 1404 and mouse 1406 and a plurality of output devices such as a display 1408, and a printer 1410. The computer module 1402 may be connected to a computer network 1412 via a suitable transceiver device 1414, to enable access to e.g. the Internet or other network systems such as Local Area Network (LAN) or Wide Area Network (WAN). The computer module 1402 in the example may include a processor 1418 for executing various instructions, a Random Access Memory (RAM) 1420 and a Read Only Memory (ROM) 1422. The computer module 1402 may also include a number of Input/Output (I/O) interfaces, for example I/O interface 1424 to the display 1408, and I/O interface 1426 to the keyboard 1404. The components of the computer module 1402 typically communicate via an interconnected bus 1428 and in a manner known to the person skilled in the relevant art.

[00103] It will be appreciated to a person skilled in the art that the terminology used herein is for the purpose of describing various embodiments only and is not intended to be limiting of the present invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

[00104] It is understood that the specific order or hierarchy of blocks in the processes / flowcharts disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of blocks in the processes / flowcharts may be rearranged. Further, some blocks may be combined or omitted. The accompanying method claims present elements of the various blocks in a sample order, and are not meant to be limited to the specific order or hierarchy presented.

[00105] The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean "one and only one" unless specifically so stated, but rather "one or more." The word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any aspect described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other aspects. Unless specifically stated otherwise, the term "some" refers to one or more. Combinations such as "at least one of A, B, or C," "one or more of A, B, or C," "at least one of A, B, and C," "one or more of A, B, and C," and "A, B, C, or any combination thereof include any combination of A, B, and/or C, and may include multiples of A, multiples of B, or multiples of C. Specifically, combinations such as "at least one of A, B, or C," "one or more of A, B, or C," "at least one of A, B, and C," "one or more of A, B, and C," and "A, B, C, or any combination thereof may be A only, B only, C only, A and B, A and C, B and C, or A and B and C, where any such combinations may contain one or more member or members of A, B, or C. All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. The words "module," "mechanism," "element," "device," and the like may not be a substitute for the word "means." As such, no claim element is to be construed as a means plus function unless the element is expressly recited using the phrase "means for."

Claims

CLAIMS What is claimed is:

1. A method of pronoun resolution, comprising:

identifying a set of subjects in a first sentence and a pronoun in a second sentence; generating a set of tuples based on the first sentence and the second sentence, each tuple of the set of tuples including a subject of the set of subjects, the pronoun, a first subgraph of a first dependency graph corresponding to the first sentence, a second subgraph of a second dependency graph corresponding to the second sentence;

for each tuple of the set of tuples, identifying a trained machine learning based model corresponding to the each tuple;

for each tuple of the set of tuples, determining a validation score of the each tuple using the trained machine learning based model corresponding to the each tuple; and determining a referent of the pronoun based on the validation scores.

2. The method of claim 1, wherein the identifying the trained machine learning based model corresponding to the each tuple comprises:

normalizing the each tuple by replacing the subject and the pronoun with a first label and replacing other words within the first subgraph and the second subgraph with a second label;

merging the first subgraph and the second subgraph via the first label to obtain a joined pattern; and

identifying the trained machine learning based model based on the joined pattern.

3. The method of claim 1, wherein the determining the validity score of the each tuple using the trained machine learning based model corresponding to the each tuple comprises:

generating a word vector for the each tuple; and

feeding the word vector as input to the trained machine learning based model corresponding to the each tuple.

4. The method of claim 3, wherein the generating the word vector for the each tuple comprises:

generating a joined word sequence that includes the subject, a first set of words in the first subgraph except the subject, and a second set of words in the second subgraph except the pronoun, wherein the subject is placed in a beginning of the joined word sequence followed by the first set of words and the second set of words, wherein the first set of words follows an original sequence of words in the first sentence and the second set of words follows an original sequence of words in the second sentence; and

converting the joined word sequence into the word vector.

5. The method of claim 3, wherein the set of subjects includes nouns and noun phrases, wherein the first sentence and the second sentence are the same sentence or two different sentences.

6. The method of claim 5, further comprising determining the validation score of each of the nouns and pronouns based on outputs of the trained machine learning based models in response to the word vectors of tuples of the noun or noun phrase.

7. The method of claim 6, wherein the noun or noun phrase whose validation score is the highest is determined to be the referent of the pronoun.

8. A method of knowledge base completion, comprising:

identifying a plurality of heads and tails, and a plurality of relations in a knowledge base comprising a plurality of triples, each triple of the plurality of triples comprising a head, a tail, and a relation between the head and the tail;

generating a candidate triple comprising a candidate head, a candidate tail, and a candidate relation, the candidate head and the candidate tail selected from the plurality of heads and tails, the candidate relation selected from the plurality of relations, the candidate triple being outside of the plurality of triples; and

estimating validity of a sentence formed based on the candidate triple.

9. The method of claim 8, wherein the estimating the validity of the sentence comprises:

parsing the sentence into a dependency graph;

splitting the dependency graph into a first subgraph and a second subgraph at the candidate head;

generating a set of tuples for the dependency graph, wherein each tuple is based on the candidate head, a subgraph of the first subgraph, and a subgraph of the second subgraph;

identifying a trained machine learning based model corresponding to each tuple; and

estimating validity of the sentence using the trained machine learning based models of the set of tuples for the dependency graph.

10. The method of claim 9, wherein the candidate head of a tuple is included in both the first subgraph and the second subgraph of the tuple.

1 1. The method of claim 9, wherein the identifying the trained machine learning based model comprises:

normalizing the tuple by replacing the candidate head with a first label and replacing other words within the first subgraph and the second subgraph with a second label;

12. The method of claim 9, wherein the estimating the validity of the sentence using the trained machine learning based model comprises:

generating a word vector for the tuple; and

feeding the word vector as input to the trained machine learning based model.

13. The method of claim 12, wherein the generating the word vector comprises:

generating a joined word sequence that includes the candidate head, a first set of words in the first subgraph except the candidate head, and a second set of words in the second subgraph except the candidate head, wherein the candidate head is placed in a beginning of the joined word sequence followed by the first set of words and the second set of words, wherein the first set of words and the second set of words follow an original sequence of words in the sentence; and

converting the joined word sequence into the word vector.

14. The method of claim 12, further comprising determining the validity of the sentence based on outputs of the trained machine learning based models in response to the word vectors of the set of tuples.

15. The method of claim 14, further comprising adding the candidate triple into the knowledge base when the output of the trained machine learning based model satisfies a threshold.

16. An apparatus for pronoun resolution, comprising:

a memory; and

at least one processor coupled to the memory and configured to:

identify a set of subjects in a first sentence and a pronoun in a second sentence;

generate a set of tuples based on the first sentence and the second sentence, each tuple of the set of tuples including a subject of the set of subjects, the pronoun, a first subgraph of a first dependency graph corresponding to the first sentence, a second subgraph of a second dependency graph corresponding to the second sentence;

for each tuple of the set of tuples, identify a trained machine learning based model corresponding to the each tuple;

for each tuple of the set of tuples, determine a validation score of the each tuple using the trained machine learning based model corresponding to the each tuple; and determining a referent of the pronoun based on the validation scores.

17. The apparatus of claim 16, wherein, to identify the trained machine learning based model corresponding to the each tuple, the at least one processor is configured to:

normalize the each tuple by replacing the subject and the pronoun with a first label and replacing other words within the first subgraph and the second subgraph with a second label;

merge the first subgraph and the second subgraph via the first label to obtain a joined pattern; and

identify the trained machine learning based model based on the joined pattern.

18. The apparatus of claim 16, wherein, to determine the validity score of the each tuple using the trained machine learning based model corresponding to the each tuple, the at least one processor is configured to:

generate a word vector for the each tuple; and

feed the word vector as input to the trained machine learning based model corresponding to the each tuple.

19. The apparatus of claim 18, wherein, to generate the word vector for the each tuple, the at least one processor is configured to:

generate a joined word sequence that includes the subject, a first set of words in the first subgraph except the subject, and a second set of words in the second subgraph except the pronoun, wherein the subject is placed in a beginning of the joined word sequence followed by the first set of words and the second set of words, wherein the first set of words follows an original sequence of words in the first sentence and the second set of words follows an original sequence of words in the second sentence; and

convert the joined word sequence into the word vector.

20. The apparatus of claim 16, wherein the set of subjects includes nouns and noun phrases, wherein a noun or noun phrase contained in a tuple that has a highest validation score is determined to be the referent of the pronoun.