US20020046018A1 - Discourse parsing and summarization - Google Patents
Discourse parsing and summarization Download PDFInfo
- Publication number
- US20020046018A1 US20020046018A1 US09/854,301 US85430101A US2002046018A1 US 20020046018 A1 US20020046018 A1 US 20020046018A1 US 85430101 A US85430101 A US 85430101A US 2002046018 A1 US2002046018 A1 US 2002046018A1
- Authority
- US
- United States
- Prior art keywords
- tree
- discourse
- text segment
- input text
- reduce
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/44—Statistical methods, e.g. probability models
Definitions
- the present application relates to computational linguistics and more particularly to techniques for parsing a text to determine its underlying rhetorical, or discourse, structure, and to techniques for summarizing, or compressing, text.
- Computational linguistics is the study of the applications of computers in processing and analyzing language, as in automatic machine translation (“MT”) and text analysis.
- MT automatic machine translation
- researchers have developed and frequently use various types of tree structures to graphically represent the structure of a text segment (e.g., clause, sentence, paragraph or entire treatise).
- Two basic tree types include (1) the syntactic tree, which can be used to graphically represent the syntactic relations among components of a text segment, and (2) the rhetorical tree (equivalently, the rhetorical structure tree (RST) or the discourse tree), which can be used to graph the rhetorical relationships among components of a text segment.
- Rhetorical structure trees are discussed in detail in William C. Mann and Sandra A.
- FIG. 1 shows the types of structures in a discourse tree 100 for a text fragment.
- the leaves 102 of the tree correspond to elementary discourse units (“edus”) and the internal nodes correspond to contiguous text spans.
- Each node in a discourse tree is characterized by a “status” (i.e., either “nucleus” or “satellite”) and a “rhetorical relation,” which is a relation that holds between two non-overlapping text spans.
- nuclei 104 are represented by straight lines while satellites 106 are represented by arcs.
- nuclei The distinction between nuclei and satellites comes from empirical observations that a nucleus expresses information that is more essential than a satellite to the writer's intention, and that the nucleus of a rhetorical relation is comprehensible independent of the satellite but not vice versa. When spans are equally important, the relation is said to be “multinuclear.”
- Rhetorical relations reflect semantic, intentional and/or textual relations that hold between text spans.
- Examples of rhetorical relations include the following types indicated in capitals: one text span may ELABORATE on another text span; the information in two text spans may be in CONTRAST; and the information in one text span may provide JUSTIFICATION for the information presented in another text span.
- Other types of rhetorical relations include EVIDENCE, BACKGROUND, JOINT, and CAUSE.
- the internal nodes of discourse tree 100 are labeled with their respective rhetorical relation names 108 .
- Implementations of the disclosed discourse parsing system and techniques may include various combinations of the following features.
- a discourse structure for an input text segment (e.g., a clause, a sentence, a paragraph or a treatise) is determined by generating a set of one or more discourse parsing decision rules based on a training set, and determining a discourse structure for the input text segment by applying the generated set of discourse parsing decision rules to the input text segment.
- the training set may include a plurality of annotated text segments (e.g., built manually by human annotators) and a plurality of elementary discourse units (edus).
- Each annotated text segment may be associated with a set of edus that collectively represent the annotated text segment.
- Generating the set of discourse parsing decision rules may include iteratively performing one or more operations (e.g., a shift operation and one or more different types of reduce operations) on a set of edus to incrementally build the annotated text segment associated with the set of edus.
- the different types of reduce operations may include one or more of the following six operations: reduce-ns, reduce-sn, reduce-nn, reduce-below-ns, reduce-below-sn, reduce-below-nn.
- the six reduce operations and the shift operation may be sufficient to derive the discourse tree of any input text segment.
- Determining a discourse structure may include incrementally building a discourse tree for the input text segment, for example, by selectively combining elementary discourse trees (edts) into larger discourse tree units. Moreover, incrementally building a discourse tree for the input text segment may include performing operations on a stack and an input list of edts, one edt for each edu in a set of edus corresponding to the input text segment.
- edts elementary discourse trees
- the input text segment Prior to determining the discourse structure for the input text segment, the input text segment may be segmented into edus, which are inserted into the input list. Segmenting the input text segment into edus may be performed by applying a set of automatically learned discourse segmenting decision rules to the input text segment. Generating the set of discourse segmenting decision rules may be accomplished by analyzing a training set.
- Determining the discourse structure for the input text segment may further include segmenting the input text segment into elementary discourse units (edus); incrementally building a discourse tree for the input text segment by performing operations on the edus to selectively combine the edus into larger discourse tree units; and repeating the incremental building of the discourse tree until all of the edus have been combined.
- edus elementary discourse units
- text parsing may include generating a set of one or more discourse segmenting decision rules based on a training set, and determining boundaries in an input text segment by applying the generated set of discourse segmenting decision rules to the input text segment.
- Determining boundaries may include examining each lexeme in the input text segment in order, and, for example, assigning, for each lexeme, one of the following designations: sentence-break, edu-break, start-parenthetical, end-parenthetical, and none.
- determining boundaries in the input text segment may include recognizing sentence boundaries, edu boundaries, parenthetical starts, and parenthetical ends.
- Examining each lexeme in the input text segment may include associating features with the lexeme based on surrounding context.
- generating discourse trees may include segmenting an input text segment into edus, and incrementally building a discourse tree for the input text segment by performing operations on the edus to selectively combine the edus into larger discourse tree units.
- the incremental building of the discourse tree may be repeated until all of the edus have been combined into a single discourse tree.
- the incremental building of the discourse tree is based on predetermined decision rules, such as automatically learned decision rules generated by analyzing a training set of annotated discourse trees.
- a discourse parsing system may include a plurality of automatically learned decision rules; an input list comprising a plurality of edts, each edt corresponding to an edu of an input text segment; a stack for holding discourse tree segments while a discourse tree for the input text segment is being built; and a plurality of operators for incrementally building the discourse tree for the input text segment by selectively combining the EDTs into a discourse tree segment according to the plurality of decision rules and moving the discourse tree segment onto the stack.
- the system may further include a discourse segmenter for partitioning the input text segment into edus and inserting the edus into the input list.
- discourse parsing systems and techniques as described herein.
- the systems and techniques described here result in a discourse parsing system that uses a set of learned decision rules to automatically determine the underlying discourse structure of any unrestricted text.
- the discourse parsing system can be used, among other ways, for constructing discourse trees whose leaves are sentences (or units that can be identified at high levels of performance).
- the time, expense, and inconsistencies associated with manually built discourse tree derivation rules are reduced dramatically.
- the rhetorical parsing algorithm described herein implements robust lexical, syntactic and semantic knowledge sources. Moreover, the six reduce operations used by the parsing algorithm, along with the shift operation, are mathematically sufficient to derive the discourse structure of any input text.
- Text summarization (also referred to as text compression) is the process of a taking a longer unit of text (e.g., a long sentence, a paragraph, or an entire treatise) and converting it into a shorter unit of text (e.g., a short sentence or an abstract) referred to as a summary.
- Automated summarization that is, using a computer or other automated process to produce a summary—has many applications, for example, in information retrieval, abstracting, automatic test scoring, headline generation, television captioning, and audio scanning services for the blind.
- FIG. 10 shows a block diagram of an automated summarization process.
- an input text 1000 is provided to a summarizer 1002 , which generates a summary 1004 of the input text 1000 .
- a summary will capture the most salient aspects of the longer text and present them in a coherent fashion. For example, when humans produce summaries of documents, they do not simply extract sentences, clause or keywords, and then concatenate them to form a summary. Rather, humans attempt to summarize by rewriting the longer text, for example, by constructing new sentences that are grammatical, that cohere with one another, and that capture the most salient items of information in the original document.
- Implementations of the disclosed summarization systems and techniques may include various combinations of the following features.
- a tree structure (e.g., a discourse tree or a syntactic tree) is summarized by generating a set of one or more summarization decision rules (e.g., automatically learned decision rules) based on a training set, and compressing the tree structure by applying the generated set of summarization decision rules to the tree structure.
- the tree structure to be compressed may be generated by parsing an input text segment such as a clause, a sentence, a paragraph, or a treatise.
- the compressed tree structure may be converted into a summarized text segment that is grammatical and coherent.
- the summarized text segment may include sentences not present in a text segment from which the pre-compressed tree structure was generated.
- Applying the generated set of summarization decision rules comprises performing a sequence of modification operations on the tree structure, for example, one or more of a shift operation, a reduce operation, and a drop operation.
- the reduce operation may combine a plurality of trees into a larger tree, and the drop operation may delete constituents from the tree structure.
- the training set used to generate the decision rules may include pre-generated long/short tree pairs.
- Generating the set of summarization decision rules comprises iteratively performing one or more tree modification operations on a long tree until the paired short tree is realized.
- a plurality of long/short tree pairs may be processed to generate a plurality of learning cases.
- generating the set of decision rules may include applying a learning algorithm to the plurality of learning cases.
- one or more features may be associated with each of the learning cases to reflect context.
- a computer-implemented summarization method may include generating a parse tree (e.g., a discourse tree or a syntactic tree) for an input text segment, and iteratively reducing the generated parse tree by selectively eliminating portions of the parse tree. Iterative reduction of the parse tree may be performed based on a plurality of learned decision rules, and may include performing tree modification operations on the parse tree.
- the tree modification operations may include one or more of the following: a shift operation, a reduce operation (which, for example, combines a plurality of trees into a larger tree), and a drop operation (which, for example, deletes constituents from the tree structure).
- summarization is accomplished by parsing an input text segment to generate a parse tree (e.g., a discourse tree or a syntactic tree) for the input segment, generating a plurality of potential solutions, applying a statistical model to determine a probability of correctness for each of potential solution, and extracting one or more high-probability solutions based on the solutions' respective determined probability of correctness.
- Applying a statistical model may include using a stochastic channel model algorithm that, for example, performs minimal operations on a small tree to create a larger tree.
- using a stochastic channel model algorithm may include probabilistically choosing an expansion template.
- Generating a plurality of potential solutions may include identifying a forest of potential compressions for the parse tree.
- the generated parse tree may have one or more nodes, each node having N children (wherein N is an integer).
- identifying a forest of potential compressions may include generating 2 N —1 new nodes, one node for each non-empty subset of the children, and packing the newly generated nodes into a whole.
- identifying a forest of potential compressions may include assigning an expansion-template probability to each node in the forest.
- Extracting one or more high-probability solutions may include selecting one or more trees based on a combination of each tree's word-bigram and expansion-template score. For example, a list of trees may be selected, one for each possible compression length.
- the potentials solutions may be normalized for compression length. For example, for each potential solution, a log-probability of correctness for the solution may be divided by a length of compression for the solution.
- the disclosed summarizer generates summaries automatically, e.g., in a computer-implemented manner. Accordingly, the inconsistencies, errors, time and/or expense typically incurred with conventional approaches that require manual intervention are reduced dramatically.
- the two different embodiments of the summarizer both generate coherent, grammatical results but also potentially provide different advantages.
- the channel-based summarizer provides multiple different solutions at varying levels of compression. These multiple solutions may be desirable if, for example, the output of the summarizer was being provided to a user (e.g., human or computer process) that could make use of multiple outputs.
- the decision-based summarizer is deterministic and thus provides a single solution and does so very quickly. Accordingly, depending on the objectives of the user, the decision-based summarizer may be advantageous both for its speed and for its deterministic approach.
- the channel-based summarizer may be advantageous depending on a user's objectives because its performance can be adjusted, or fine-tuned, to a particular application by replacing or adjusting its statistical model.
- performance of the decision-based summarizer can be fine-tuned to a particular application by varying the training corpus used to learn decision rules.
- a decision-based summarizer could be tailored to summarize text or trees in a specific discipline by selecting a training corpus specific to that discipline.
- FIG. 1 shows and example of a discourse tree.
- FIG. 2 is a flowchart of generating a discourse tree for an input text.
- FIG. 3 is a block diagram of a discourse tree generating system.
- FIG. 4 shows an example of shift-reduce operations performed in discourse parsing a text.
- FIG. 5 shows the operational semantics of six reduce operations.
- FIG. 6 is a flowchart of generating decision rules for a discourse segmenter.
- FIG. 6A shows examples of automatically derived segmenting rules.
- FIG. 7 is a graph of a learning curve for a discourse segmenter.
- FIG. 8 is a flowchart of generating decision rules for a discourse segmenter.
- FIG. 8A shows examples of automatically derived shift-reduce rules.
- FIG. 8B shows a result of applying Rule 1 in FIG. 8A on the edts that correspond to the units in text example (5.1).
- FIG. 8C shows a result of applying Rule 2 in FIG. 8A on the edts that correspond to the units in text example (5.2).
- FIG. 8D shows an example of a CONTRAST relation that holds between two paragraphs.
- FIG. 8E shows a result of applying Rule 4 in FIG. 8A on the on the trees that subsume the two paragraphs in FIG. 8D.
- FIG. 9 is a graph of a learning curve for a shift-reduce action identifier.
- FIG. 10 is a block diagram of an automated summarization system.
- FIG. 11 shows examples of parse (or syntactic) trees.
- FIG. 12 shows examples of text from a training corpus.
- FIG. 13 is a graph of adjusted log-probabilities for top scoring compressions at various compression lengths.
- FIG. 14 shows an example of incremental tree compression.
- FIG. 15 shows examples of text compression.
- FIG. 16 shows examples of summarizations of varying compression lengths.
- FIG. 17 is a flowchart of a channel-based summarization process.
- FIG. 18 is a flowchart of a process for training a channel-based summarizer.
- FIG. 18A shows examples of rules that were learned automatically by the C4.5 program
- FIG. 19 is a flowchart of a decision-based summarization process.
- FIG. 20 is a flowchart of a process for training a decision-based summarizer.
- a decision-based rhetorical parsing system (equivalently, a discourse parsing system) automatically derives the discourse structure of unrestricted texts and incrementally builds corresponding discourse trees based on a set of learned decision rules.
- the discourse parsing system uses a shift-reduce rhetorical parsing algorithm that learns to construct rhetorical structures of texts from a corpus of discourse-parse action sequences.
- the rhetorical parsing algorithm implements robust lexical, syntactic and semantic knowledge sources.
- the resulting output of the discourse parsing system is a rhetorical tree.
- This functionality is useful both in its standalone form (e.g., as a tool for linguistic researchers) and as a component of a larger system, such as in a discourse-based machine translation system, as described in Daniel Marcu et al., “The Automatic Translation of Discourse Structures,” Proceedings of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics, pp. 9-17, Seattle, Washington, (April 29 - May 3, 2000), and Daniel Marcu, “The Theory and Practice of Discourse Parsing and Summarization,” The MIT Press (2000), both of which are incorporated herein.
- FIG. 2 shows a flowchart of a discourse parsing process 200 that generates a discourse tree from an input text.
- the process 200 breaks the text into elementary discourse units, or “edus.”
- Edus are defined functionally as clauses or clause-like units that are unequivocally the nucleus or satellite of a rhetorical relation that holds between two adjacent spans of text. Further details of edus are discussed below.
- step 206 the edus are put into an input list.
- step 208 the process 200 uses the input list, a stack, and a set of learned decision rules to perform the shift-reduce rhetorical parsing algorithm, which eventually yields the discourse structure of the text given as input.
- step 210 performing the algorithm results in the generation of a discourse tree that corresponds to the input text.
- FIG. 3 shows a block diagram of a discourse tree generating system 300 that takes in input text 301 and produces discourse tree 305 .
- the system 300 as shown includes two sub-systems: (1) a discourse segmenter 302 that identifies the edus in a text, and (2) a discourse parser 304 (equivalently, a shift-reduce action identifier), which determines how the edus should be assembled into rhetorical structure trees.
- the discourse segmenter 302 which serves as a front-end to the discourse parser 304 , partitions the input text into edus.
- the discourse segmenter processes an input text one lexeme (word or punctuation mark) at a time and recognizes sentence and edu boundaries and beginnings and ends of parenthetical units.
- the discourse parser 304 takes in the edus from the segmenter 302 and applies the shift-reduce algorithm to incrementally build the discourse tree 305 .
- each of the discourse segmenter 302 and the discourse parser 304 performs its operations based on a set of decision rules that were learned from analyzing a training set, as discussed in detail below.
- An alternative embodiment is possible, however, in which substantially the same results could be achieved using probabilistic rules.
- the training corpus (equivalently, the training set) used was a body of manually built (i.e., by humans) rhetorical structure trees. This corpus, which included 90 texts that were manually annotated with discourse trees, was used to generate learning cases of how texts should be partitioned into edus and how discourse units and segments should be assembled into discourse trees.
- a corpus of 90 rhetorical structure trees were used, which were built manually using rhetorical relations that were defined informally in the style of Mann et al., “Rhetorical structure theory: Toward a functional theory of text organization, Text, 8(3):243-281 (1988): 30 trees were built for short personal news stories from the MUC7 co-reference corpus (Hirschman et al., MUC-7 Coreference Task Definition, 1997); 30 trees for scientific texts from the Brown corpus; and 30 trees for editorials from the Wall Street Journal (WSJ). The average number of words for each text was 405 in the MUC corpus, 2029 in the Brown corpus and 878 in the WSJ corpus. Each MUC text was tagged by three annotators; each Brown and WSJ text was tagged by two annotators.
- each text is a (possibly non-binary) tree whose leaves correspond to elementary discourse units (edu)s, and whose internal nodes correspond to contiguous text spans.
- Each internal node is characterized by a rhetorical relation, such as ELABORATION and CONTRAST.
- ELABORATION and CONTRAST Each relation holds between two non-overlapping text spans called NUCLEUS and SATELLITE.
- some relations, such as SEQUENCE and CONTRAST are multinuclear.
- nuclei and satellites comes from the empirical observation that the nucleus expresses what is more essential to the writer's purpose than the satellite.
- Each node in the tree is also characterized by a promotion set that denotes the units that are important in the corresponding subtree.
- the promotion sets of leaf nodes are the leaves themselves.
- the promotion sets of internal nodes are given by the union of the promotion sets of the immediate nuclei nodes.
- Some edus may contain parenthetical units, i.e., embedded units whose deletion does not affect the understanding of the edu to which they belong.
- parenthetical units i.e., embedded units whose deletion does not affect the understanding of the edu to which they belong.
- the unit shown in italics in text (2), below, is parenthetic.
- the annotation process involved assigning edu and parenthetical unit boundaries, assembling edus and spans into discourse trees, and labeling the relations between edus and spans with rhetorical relation names from a taxonomy of 71 relations. No explicit distinction was made between intentional, informational, and textual relations. In addition, two constituency relations were marked that were ubiquitous in the corpus and that often subsumed complex rhetorical constituents. These relations were ATTRIBUTION, which was used to label the relation between a reporting and a reported clause, and APPOSITION.
- the rhetorical tagging tool used—namely, the RST Annotation Tool downloadable from, and described at:
- the discourse parsing process is modeled as a sequence of shift-reduce operations.
- the input to the parser is an empty stack and an input list that contains a sequence of elementary discourse trees (“edts”), one edt for each edu produced by the discourse segmenter.
- edts elementary discourse trees
- the status and rhetorical relation associated with each edt is “UNDEFINED”, and the promotion set is given by the corresponding edu.
- the parser applies a “Shift” or a “Reduce” operation.
- Shift operations transfer the first edt of the input list to the top of the stack.
- Reduce operations pop the two discourse trees located on the top of the stack; combine them into a new tree updating the statuses, rhetorical relation names, and promotion sets associated with the trees involved in the operation; and push the new tree on the top of the stack.
- FIG. 4 shows the actions taken by a shift-reduce discourse parser starting with step i.
- the stack contains 4 partial discourse trees, which span units [ 1 , 11 ], [ 12 , 15 ], [ 16 , 17 ], and [ 18 ], and the input list contains the edts that correspond to units whose numbers are higher than or equal to 19.
- the parser decides, based on its predetermined decision rules, to perform a Shift operation. As a result, the edt corresponding to unit 19 becomes the top of the stack.
- the parser performs a “Reduce-Apposition-NS” operation, that combines edts 18 and 19 into a discourse tree whose nucleus is unit 18 and whose satellite is unit 19 .
- the rhetorical relation that holds between units 18 and 19 is APPOSITION.
- the trees that span over units [ 16 , 17 ] and [ 18 , 19 ] are combined into a larger tree, using a “Reduce-Attribution-NS” operation.
- the status of the tree [ 16 , 17 ] becomes “nucleus”
- the status of the tree [ 18 , 19 ] becomes “satellite.”
- the rhetorical relation between the two trees is SMALL ATTRIBUTION.
- the trees at the top of the stack are combined using a “Reduce-Elaboration-NS” operation. The effect of the operation is shown at the bottom of FIG. 4.
- ns nucleus-satellite
- sn satellite-nucleus
- nn nucleus-nucleus
- FIG. 5 illustrates how the statuses and promotion sets associated with the trees involved in the reduce operations are affected in each case.
- the relations that shared some rhetorical meaning were grouped into clusters of rhetorical similarity.
- the cluster named “contrast” contained the contrast-like rhetorical relations of ANTITHESIS, CONTRAST, and CONCESSION.
- the cluster named “evaluation-interpretation” contained the rhetorical relations EVALUATION and INTERPRETATION.
- the cluster named “other” contained rhetorical relations such as question-answer, proportion, restatement, and comparison, which were used very seldom in the corpus.
- the grouping process yielded 17 clusters, each characterized by a generalized rhetorical relation name.
- FIG. 6 is a flowchart of a generalized process 600 for generating decision rules for the discourse segmenter.
- the first step in the process was to build, or otherwise obtain, the training corpus. As discussed above, this corpus was built manually using an annotation tool.
- human annotators looked at text segments and for each lexeme (word or punctuation mark) determined whether an edu boundary existed at the lexeme under consideration and either marked it with a segment break or not, depending on whether an edu boundary existed.
- step 604 for each lexeme, a set of one or more features was associated to each of the edu boundary decisions, based on the context in which these decisions were made.
- the result of such association is a set of learning cases—essentially, discrete instances that capture the edu-boundary decision-making process for a particular lexeme in a particular context. More specifically, the leaves of the discourse trees that were built manually were used in order to derive the learning cases.
- the classes to be learned, which are associated with each lexeme, are “sentence-break”, “edu-break”, “start-paren”, and “end-paren”, and “none”. Further details of the features used in step 604 for learning follow.
- the local context consists of a window of size 5 (1+2+2) that enumerates the Part-Of-Speech (POS) tags of the lexeme under scrutiny and the two lexemes found immediately before (2) and after it (2).
- POS tags are determined automatically, using the “Brill Tagger,” as described in Eric Brill, “Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging,” Computational Linguistics, 21(4):543-565, which is incorporated by reference.
- discourse markers such as “because” and “and”, typically play a major role in rhetorical parsing, also considered was a list of features that specify whether a lexeme found within the local contextual window is a potential discourse marker; hence, for each lexeme under scrutiny, it is specified whether it is a special orthographical marker, such as comma, dash, and parenthesis, or whether it is a potential discourse marker, such as “accordingly,” “afterwards,” and “and.”
- the local context also contains features that estimate whether the lexemes within the window are potential abbreviations. In this regard, a hard-coded list of 250 potential abbreviations can be used.
- the global context reflects features that pertain to the boundary identification process. These features specify whether there are any commas, closed parentheses, and dashes before the estimated end of the sentence, whether there are any verbs in the unit under consideration, and whether any discourse marker that introduces expectations was used in the sentence under consideration. These markers include phrases such as Although and With.
- the decision-based segmenter uses a total of twenty-five features, some of which can take as many as 400 values. When we represent these features in a binary format, we obtain learning examples with 2417 binary features/example.
- step 606 a learning algorithm such as the C4.5 algorithm as described in J. Ross Quinlan, “C4.5: Programs for Machine Learning,” Morgan Kaufmann Publishers (1993), to learn a set of decision rules from the learning cases.
- the result is set of discourse segmenter decision rules 608 that collectively define whether a previously unseen lexeme, given its particular context, represents an edu boundary within its particular context in the text segment under consideration.
- FIG. 6A shows some of the rules that were learned by the C4.5 program using a binary representation of the features and learning cases extracted from the MUC corpus.
- Rule 1 specifies that if the POS tag of the lexeme that immediately precedes the lexeme under scrutiny is a closed parenthesis and the previous marker recognized during the processing or the current sentence was an open parenthesis, then the action to be taken is to insert an end of parenthetic unit.
- Rule 1 can correctly identify the end of the parenthetic unit at the location marked with the symbol T in sentence (4.1) below.
- Rule 2 can correctly identify the beginning of the parenthetic unit 44 years old in sentence 4.2 because the unit is preceded by a comma and starts with a numeral (CD) followed by a plural noun (NNS).
- Rule 3 identifies the end of a sentence after the occurrence of a DOT (period, questions mark, or exclamation mark) that is not preceded or followed by another DOT and that is not followed by a DOUBLEQUOTE. This rule will correctly identify the sentence end after the period in example 4.3, but will not insert a sentence end after the period in example 4.4. However, another rule that is derived automatically will insert a sentence break after the double quote that follows the T mark in example 4.4.
- DOT peripheral, questions mark, or exclamation mark
- Rule 4 identifies an edu boundary before the occurrence of an “and” followed by a verb in the past tense (VPT). This rule will correctly identify the marked edu boundary in sentence 4.5.
- Rule 5 inserts edu boundaries before the occurrence of the word “until”, provided that “until” is followed not necessarily by a verb. This rule will correctly insert an edu boundary in example 4.6.
- Rule 6 is an automatically derived rule that mirrors the manually derived rule specific to COMMA-like actions in the surface-based unit identification algorithm. Rule 6 will correctly insert an edu boundary after the comma marked in example 4.7, because the marker “While” was used at the beginning of the sentence.
- Rule 7 specifies that no elementary or parenthetical unit boundary should be inserted immediately before a DOT.
- the rules in FIG. 6 a are more complex than typical manually derived rules.
- the automatically derived rules make use not only of orthographic and cue-phrase-specific information, but also of syntactic information, which is encoded as part of speech tags.
- step 606 the C4.5 program was used in order to learn decision trees and rules that classify lexemes as boundaries of sentences, edus, or parenthetical units, or as non-boundaries. Learning was accomplished both from binary representations (when possible) and non-binary representations of the cases. (Learning from binary representations of features in the Brown corpus was too computationally expensive to terminate—the Brown data file had about 0.5 Giga-bytes.) In general the binary representations yielded slightly better results than the non-binary representations and the tree classifiers were slightly better than the rule-based ones.
- Table 1 shows accuracy results of non-binary, decision-tree classifiers. The accuracy figures were computed using a ten-fold cross-validation procedure.
- B1 corresponds to a majority-based baseline classifier that assigns the class “none” to all lexemes
- B2 to a baseline classifier that assigns a sentence boundary to every “DOT” (that is, a period (.), question mark (?), and/or exclamation point (!)) lexeme and a non-boundary to all other lexemes.
- DOT period
- ? question mark
- ! exclamation point
- FIG. 7 shows the learning curve that corresponds to the MUC corpus. It suggests that more data can increase the accuracy of the classifier.
- the confusion matrix shown in Table 2 corresponds to a non-binary-based tree classifier that was trained on cases derived from 27 Brown texts and that was tested on cases derived from 3 different Brown texts, which were selected randomly.
- the matrix shows that the segmenter encountered some difficulty with identifying the beginning of parenthetical units and the intra-sentential edu boundaries; for example, it correctly identifies 133 of the 220 edu boundaries.
- the performance is high with respect to recognizing sentence boundaries and ends of parenthetical units.
- the performance with respect to identifying sentence boundaries appears to be close to that of systems aimed at identifying “only” sentence boundaries, such as described in David D. Palmer and Marti A.
- Hearst “Adaptive multilingual sentence boundary disambiguation,” Computational Linguistics, 23(2):241-269 (1997) (hereinafter, “Hearst (1997)”), whose accuracy is in the range of 99%.
- FIG. 8 shows a generalized flowchart for a process 800 for generating decision rules for the discourse parser.
- the process 800 can be used to train the discourse parser about when and under what circumstances, and in what sequence, it should perform the various shift-reduce operations.
- step 802 the process receives as input the training corpus of discourse trees and, for each discourse tree, a set of edus from the discourse segmenter.
- step 804 for each discourse tree/edu set, the process 800 determines a sequence of shift-reduce operations that reconstructs the discourse tree from the edus in that tree's corresponding set.
- step 806 the process 800 associates features with each entry in each sequence.
- step 808 the process 800 applies a learning algorithm (e.g., C4.5) to generate decision rules 810 for the discourse parser.
- a learning algorithm e.g., C4.5
- the shift-reduce action identifier focuses on the three topmost trees in the stack and the first edt in the input list. These trees are referred to as the trees “in focus.”
- the identifier relies on the following classes of features: structural features, lexical (cue-phase-like) features, operational features, and semantic-similarity-based features. Each is described in turn.
- Structural features include the following:
- Lexical features include the following:
- Operational features includes features that specify what the last five parsing operations performed by the parser were. These features could be generated because, for learning, sequences of shift-reduce operations were used and not discourse trees.
- Semantic-similarity-based features include the following:
- sim ⁇ ( S 1 , S 2 ) ⁇ t ⁇ S 1 ⁇ S 2 ⁇ w ⁇ ( t ) S 1 ⁇ w ⁇ ( t ) S 2 ⁇ t ⁇ S 1 ⁇ w ⁇ ( t ) S 1 2 ⁇ ⁇ t ⁇ S 2 ⁇ w ⁇ ( t ) S 2 2 2
- Wordnet-based measures of similarity between the bags of words in the promotion sets of the trees in focus. Fourteen Wordnet-based measures of similarity were used, one for each Wordnet relation (Fellbaum, Wordnet: An Electronic Lexical Database , The MIT Press, 1998). Each of these similarities is computed using a metric similar to the cosine-based metric. Wordnet-based similarities reflect the degree of synonymy, antonymy, meronymy, hyponymy, and the like between the textual segments subsumed by the trees in focus. The Wordnet-based similarities are computed over the tokens that are found in the promotion units associated with each segment.
- the Wordnet-based similarities between the two segments can be computed using the formula shown in below, where the function ⁇ (w1, w2) returns 1 if there exists a Wordnet relation of type R between the words w 1 and w 2 , and 0 otherwise.
- sim wordnet ⁇ ⁇ Relation ⁇ ( W 1 , W 2 ) ⁇ w 1 ⁇ W 1 , w 2 ⁇ W 2 ⁇ ⁇ wordnet ⁇ ⁇ Relation ⁇ ( w 1 , w 2 ) ⁇ W 1 ⁇ ⁇ ⁇ W 2 ⁇
- the Wordnet-based similarity function takes values in the interval [0,1]: the larger the value, the more similar with respect to a given Wordnet relation the two segments are.
- FIG. 8A shows some of the rules that were learned by the C4.5 program using a binary representation of the features and learning cases extracted from the MUC corpus.
- Rule 1 which is similar to a typical rule derived manually, specifies that if the last lexeme in the tree at position top-1 in the stack is a comma and there is a marker “if” that occurs at the beginning of the text that corresponds to the same tree, then the trees at position top-1 and top should be reduced using a REDUCE-CONDITION-SN operation. This operation will make the tree at position top-1 the satellite of the tree at position top.
- Rule 2 makes the tree at the top of the stack the BACKGROUND-CIRCUMSTANCE satellite of the tree at position top-1 when the first word in the text subsumed by the top tree is “when”, which is a while-adverb (WRB), when the second word in the same text is not a gerund or past participle verb (VBG), and when the cosine-based similarity between the text subsumed by the top node in the stack and the first unit in the list of elementary discourse units that have not been shifted to the stack is greater than 0.0793052. If the edt as position top-1 in the stack subsumes unit 1 in example 5.2 and the edt at position top subsumes unit 2, rule 2 will correctly replace the two edts with the rhetorical tree shown in FIG. 8C.
- WRB while-adverb
- VBG gerund or past participle verb
- the action to be applied is REDUCE-BACKGROUND-CIRCUMSTANCE-NS.
- the action to be applied is REDUCE-CONTRAST-NN.
- the trees at the top of the stack subsume the paragraphs shown in FIG. 8D and are characterized by promotion sets P1 and P2, as a result of applying rule 4 in FIG. 8A, one would obtain a new tree, whose shape is shown in FIG. 8E; the promotion units of the root node of this tree are given by the union of the promotion units of the child nodes.
- the last rule in FIG. 8A reflects the fact that each text in the MUC corpus is characterized by a title.
- Table 3 below displays the accuracy of the shift-reduce action identifiers, determined for each of the three corpora (MUC, Brown, WSJ) by means of a ten-fold cross-validation procedure.
- the B3 column gives the accuracy of a majority-based classifier, which chooses action SHIFT in all cases. Since choosing only the action SHIFT never produces a discourse tree, column B4 presents the accuracy of a baseline classifier that chooses shift-reduce operations randomly, with probabilities that reflect the probability distribution of the operations in each corpus. TABLE 3 Performance of the tree-based, shift-reduce action classifiers.
- FIG. 9 shows the learning curve that corresponds to the MUC corpus. As in the case of the discourse segmenter, this learning curve also suggests that more data can increase the accuracy of the shift-reduce action identifier. Evaluation of the rhetorical parser By applying the two classifiers sequentially, one can derive the rhetorical structure of any text. The performance results presented above suggest how well the discourse segmenter and the shift-reduce action identifier perform with respect to individual cases, but provide no information about the performance of a rhetorical parser that relies on these classifiers. TABLE 4 Performance of the rhetorical parser: labeled (R)ecall and (P)recision. The segmenter is either Decision-Tree-Based (DT) or Manual (M).
- DT Decision-Tree-Based
- M Manual
- each corpus was partitioned randomly into two sets of texts: 27 texts were used for training and the last 3 texts were used for testing.
- the evaluation employs “labeled recall” and “labeled precision” measures, which are extensively used to study the performance of syntactic parsers.
- “Labeled recall” reflects the number of correctly labeled constituents identified by the rhetorical parser with respect to the number of labeled constituents in the corresponding manually built tree.
- “Labeled precision” reflects the number of correctly labeled constituents identified by the rhetorical parser with respect to the total number of labeled constituents identified by the parser.
- This section describes a probabilistic approach to the compression problem.
- a “noisy channel” framework is used.
- a long text string is regarded as (1) originally being a short string, that (2) someone added some additional, optional text to it. Compression is a matter of identifying the original short string. It is not critical whether or not the “original” string is real or hypothetical.
- a French string could be regarded as originally being in English, but having noise added to it. The French may or may not have been translated from English originally, but by removing the noise, one can hypothesize an English source—and thereby translate the string.
- the noise consists of optional text material that pads out the core signal.
- Source model To every string s a probability s) must be assigned. s) represents the chance that s is generated as an “original short string” in the above hypothetical process. For example, it may be desirable to have s) to be very low if s is ungrammatical.
- Decoder Given a long string t, a short string s is searched for that maximizes P(s
- Good source strings are ones that have both (1) a normal-looking parse tree, and (2) normal-looking word pairs.
- the stochastic channel model performs minimal operations on a small tree s to create a larger tree t.
- an expansion template is chosen probabilistically based on the labels of the node and its children. For example, when processing the S node in the tree above, one may wish to add a prepositional phrase as a third child. This is done with probability S ⁇ NP VP PP
- an expansion template is chosen, then for each new child node introduced (if any), a new subtree is grown rooted at that node—for example, (PP (P in) (NP Pittsburgh)). Any particular subtree is grown with probability given by its PCFG factorization, as above (no bigrams).
- FIG. 11 shows examples of parse trees. As shown, the tree t in FIG. 11 spans the string abcde. Consider the parse tree for compression s1, which is also shown in FIG. 11.
- s1) are computed. Breaking this down further, the source PCFG and word-bigram factors, which describe P tree (s1), are as follows: P (TOP ⁇ G
- s1) are: P (G ⁇ H A
- a different compression will be scored with a different set of factors. For example, consider a compression of t that leaves t completely untouched. In that case, the source costs P tree (t) are: P (TOP ⁇ G
- FIG. 12 shows a few examples of sentence pairs extracted from the corpus.
- This corpus was chosen because it is consistent with two desiderata specific to summarization work: (i) the human-written Abstract sentences are grammatical; (ii) the Abstract sentences represent in a compressed form the salient points of the original newspaper Sentences. The uncompressed sentences were kept in the corpus as well, since an objective was to learn not only how to compress a sentence, but also when to do it.
- Expansion-template probabilities were collected from parallel corpus. First, both sides of the parallel corpus were parsed, and then corresponding syntactic nodes were identified. For example, the parse tree for one sentence may begin
- An expansion-template probability can be assigned to each node in the forest. For example, to the B ⁇ Q node, one can assign B ⁇ Q R
- FIGS. 17 and 18 are respectively generalized flowcharts of the channel-based summarization and training processes described above.
- the first step 1702 in the channel-based summarization process 1700 is to receive the input text.
- the embodiment described above uses sentences as the input text, any other text segment could be used instead, for example, clauses, paragraphs, or entire treatises.
- step 1704 the input text is parsed to produce a syntactic tree in the style of FIG. 11, which is used in step 1706 as the basis of generating multiple possible solutions (e.g., the shared-forest structure described above). If a whole text is given as input, the text can be parsed to produce a discourse tree, and the algorithm described here will operate on the discourse tree.
- step 1706 the multiple possible solutions generated in step 1706 are ranked using pre-generated ranking statistics from a statistical model.
- step 1706 may involve assigning an expansion-template probability to each node in the forest, as described above.
- the best scoring candidate (or candidates) is (are) chosen as the final compression solutios) in step 1710 .
- the best scoring candidate may be the one having the smallest log-probability/length of compression ratio.
- FIG. 18 shows a generalized process for training a channel-based summarizer.
- the process 1800 starts in step 1802 with an input training set (or corpus).
- this input training set comprises pairs of long-short text fragments, for example, long/short sentence pairs or treatise/abstract pairs.
- this training set typically, because a main purpose of the training set is to teach the summarizer how to properly compress text, the training set used will have been generated manually by experienced editors who know how to create relevant, coherent and grammatical summaries of longer text segments.
- step 1804 the long-short text pairs are parsed to generate syntactic parse trees such as shown in FIG. 11, thereby resulting in corresponding long-short syntactic tree pairs.
- Each item of text in each pair is parsed individually in this manner. Also, the entire text is parsed using the discourse parser.
- step 1806 the resulting parse tree pairs are compared—that is, the discourse or syntactic parse tree for a long segment is compared against the discourse or syntactic parse tree for its paired short segment—to identify similarities and differences between nodes of the tree pairs.
- a difference might occur, for example, if, in generating the short segment, an editor deleted a prepositional phrase from the long segment.
- the results of this comparison are “events” that are collected for each of the long/short pairs and stored in a database. In general, two different types of events are detected: “joint events” which represent a detected correspondence between a long and short segment pair and Context-Free Grammar (CFG) events, which relate only to characteristics of the short segment in each pair.
- CFG Context-Free Grammar
- step 1808 the collected events are normalized to generate probabilities. These normalized events collectively represent the statistical learning model 1810 used by the channel-based summarizer.
- the rewriting process starts with an empty Stack and an Input List that contains the sequence of words subsumed by the large tree t. Each word in the input list is labeled with the name of all syntactic constituents in t that start with that word (see FIG. 14).
- the rewriting module applies an operation that is aimed at reconstructing the smaller tree s2.
- four types of operations are used:
- SHIFT operations transfer the first word from the input list into the stack
- REDUCE operations pop the k syntactic trees located at the top of the stack; combine them into a new tree; and push the new tree on the top of the stack. Reduce operations are used to derive the structure of the syntactic tree of the short sentence.
- DROP operations are used to delete from the input list subsequences of words that correspond to syntactic constituents.
- a DROP X operations deletes from the input list all words that are spanned by constituent X in t.
- ASSIGNTYPE operations are used to change the label of trees at the top of the stack. These actions assign POS tags to the words in the compressed sentence, which may be different from the POS tags in the original sentence.
- the decision-based model is more flexible than the channel model because it enables the derivation of a tree whose skeleton can differ quite drastically from that of the tree given as input.
- the channel-based model was unable to obtain tree s2 from t.
- the decision-based model was able to rewrite a tree t into any tree s, as long as an in-order traversal of the leaves of s produces a sequence of words that occur in the same order as the words in the tree t.
- the tree s2 can be obtained from the tree t by following this sequence of actions, whose effects are shown in FIG. 14: SHIFT, ASSIGNTYPE H; SHIFT; ASSIGNTYPE K; REDUCE 2 F; DROP B; SHIFT; ASSIGNTYPE D; REDUCE 2 G.
- FIG. 14 To save space, the SHIFT and ASSIGNTYPE operations are shown in FIG. 14 on the same line. However, it should be understood that the SHIFT and ASSIGNTYPE operations correspond to two distinct actions.
- the ASSIGNTYPE K operation rewrites the POS tag of the word b; the REDUCE operations modify the skeleton of the tree given as input.
- the input list is shown in FIG. 14 in a format that resembles the graphical representation of the trees in FIG. 11.
- each configuration of our shift-reduce-drop rewriting model is associated with a learning case.
- the learning cases are generated automatically by a program that derives sequences of actions that map each of the large trees in our corpus into smaller trees.
- the rewriting procedure simulates a bottom-up reconstruction of the smaller trees.
- Operational features reflect the number of trees in the stack, the input list, and the types of the last five operations performed. Operational features also encode information that denotes the syntactic category of the root nodes of the partial trees built up to a certain time. Examples of operational features include the following: numberTreesInStack, wasPreviousOperationShift, syntacticLabelOfTreeAtTheTopOfStack.
- Original-tree-specific features denote the syntactic constituents that start with the first unit in the input list. Examples of such features include inputListStartsWithA_CC and inputListStartsWithA_PP.
- the decision-based compression module uses the C4.5 program as described in J. Quinlan, “C4.5: Programs for Machine Learning,” Morgan Kaufmann Publishers (1993), in order to learn decision trees that specify how large syntactic trees can be compressed into shorter trees.
- a ten-fold cross-validation evaluation of the classifier yielded an accuracy of 98.16% ( ⁇ 0.14).
- a majority baseline classifier that chooses the action SHIFT has an accuracy of 28.72%.
- FIG. 18A shows examples of rules that were learned automatically by the C4.5 program.
- Rule 1 enables the deletion of WH prepositional phrases in the context in which they follow other constituents that the program decided to delete.
- Rule 2 enables the deletion of WHNP constituents. Since this deletion is carried out only when the stack contains only one NP constituent, it follows that this rule is applied only in conjunction with complex nounphrases that occur at the beginning of sentences.
- Rule 3 enables the deletion of adjectival phrases.
- the shift-reduce-drop model is applied in a deterministic fashion.
- the sentence to be compressed is parsed and the input list is initialized with the words in the sentence and the syntactic constituents that “begin” at each word, as shown in FIG. 14.
- the learned classifier is asked in a stepwise manner what action to propose. Each action is then simulated, thus incrementally building a parse tree. The procedure ends when the input list is empty and when the stack contains only one tree. An in-order traversal of the leaves of this tree produces the compressed version of the sentence given as input.
- the decision-based model is deterministic, it produces only one output.
- An advantage of this result is that compression using the decision-based model is very fast: it takes only a few milliseconds per sentence.
- One potential disadvantage, depending on one's objectives, is that the decision-based model does not produce a range of compressions, from which another system may subsequently choose. It would be relatively straightforward to extend the model within a probabilistic framework by applying, for example, techniques described in D. Magerman, “Statistical decision-tree models for parsing,” Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 276-283 (1995).
- FIGS. 19 and 20 are respectively generalized flowcharts of the decision-based summarization and training processes described above.
- the first step 1902 in the decision-based summarization process 1900 is to receive the input text.
- the embodiment described above uses sentences as the input text, any other text segment could be used instead, for example, clauses, paragraphs, or entire treatises.
- step 1904 the input text is parsed to produce a syntactic tree in the style of FIG. 11. If a full text is used, one can use a discourse parse to build the discourse tree of the text.
- step 1906 the shift-reduce-drop algorithm is applied to the syntactic/discourse tree generated in step 1904
- the shift-reduce-algorithm applies a sequence of predetermined decision rules (learned during training of the decision-based model, and identifying under what circumstances, and in what order, to perform the various shift-reduce-drop operations) to produce a compressed syntactic/discourse tree 1908 .
- the resulting syntactic/discourse tree can be used for various purposes, for example, it can be rendered into a compressed text segment and output to a user (e.g., either a human end-user or a computer process).
- the resulting syntactic/discourse tree can be supplied to a process that further manipulates the tree for other purposes.
- the resulting compressed syntactic/discourse tree could be supplied to a tree rewriter to convert it into another form, e.g., to translate it into a target language.
- a tree rewriter An example of such a tree rewriter is described in Daniel Marcu et al., “The Automatic Translation of Discourse Structures,” Proceedings of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics, pp. 9-17, Seattle, Wash., (Apr. 29 - May 3, 2000).
- FIG. 20 shows a generalized process for training a decision-based summarizer. As shown therein, the training process 2000 starts in step 2002 with an input training set as discussed above with reference to FIG. 17.
- step 2004 the long-short text pairs are parsed to generate syntactic parse trees such as shown in FIG. 11, thereby resulting in corresponding long-short syntactic tree pairs.
- step 2006 for each long-short tree pair, the training process 2000 determines a sequence of shift-reduce-drop operations that will convert the long tree into the short tree. As discussed above, this step is performed based on the following four basic operations, referred to collectively as the “shift-reduce-drop” operations—shift, reduce, drop, and assignType. These four operations are sufficient to rewrite any given long tree into its paired short tree, provided that the order of the leaves does not change.
- the output of step 2006 is a set of learning cases—one learning case for each long-short tree pair in the training set.
- each learning case is an ordered set of shift-reduce-drop operations that when applied to a long tree will generate the paired short tree.
- step 2008 the training process 2000 associates features (e.g., operational and original-tree-specific features) with the learning cases to reflect the context in which the operations are to be performed.
- features e.g., operational and original-tree-specific features
- step 2010 the training process 2000 applies a learning algorithm, for example, the C4.5 algorithm as described in J. Ross Quinlan, “C4.5: Programs for Machine Learning,” Morgan Kaufmann Publishers (1993), to learn a set of decision rules 2012 from the learning cases.
- This set of decision rules 2012 then can be used by the decision-based summarizer to summarize any previously unseen text or syntactic tree into a compressed version that is both coherent and grammatical.
- Evaluation of the Summarizer Models To evaluate the compression algorithms, 32 sentence pairs were randomly selected from the parallel corpus. This random subset is referred to as the Test Corpus. The other 1035 sentence pairs were used for training as described above.
- FIG. 15 shows three sentences from the Test Corpus, together with the compressions produced by humans, the two compression algorithms described here (channel-based and decision-based), and a baseline algorithm that produces compressions with highest word-bigram scores. The examples were chosen so as to reflect good, average, and bad performance cases.
- the first sentence in FIG. 15 (“Beyond the basic level, the operations of the three products vary widely.”) was compressed in the same manner by humans and by both of the channel-based and decision-based algorithms (the baseline algorithm chooses though not to compress this sentence).
- the output of the Decision-based algorithm is grammatical, but the semantics are negatively affected.
- the noisy-channel algorithm deletes only the word “break”, which affects the correctness of the output less.
- the noisy-channel model is again more conservative and decides not to drop any constituents.
- the decision-based algorithm compresses the input substantially, but it fails to produce a grammatical output.
- the results of Table 5 show compression rate, and mean and standard deviation results across all judges, for each algorithm and corpus.
- the results show that the decision-based algorithm is the most aggressive: on average, it compresses sentences to about half of their original size.
- the compressed sentences produced by both the channel-based algorithm and by the decision-based algorithm are more “grammatical” and contain more important words than the sentences produced by the baseline.
- T-test experiments showed these differences to be statistically significant at p ⁇ 0.01 both for individual judges and for average scores across all judges. T-tests showed no significant statistical differences between the two algorithms.
- Table 1 shows, the performance of the each of the compression algorithms is much closer to human performance than baseline performance; yet, humans perform statistically better than our algorithms at p ⁇ 0.01.
- noisy-channel modeling could be enhanced by taking into account subcategory and head-modifier statistics (in addition to simple word-bigrams).
- head-modifier statistics in addition to simple word-bigrams.
- the subject of a sentence may be separated from the verb by intervening prepositional phrases.
- statistics should be collected over subject/verb pairs, which can be extracted from parsed text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/854,301 US20020046018A1 (en) | 2000-05-11 | 2001-05-11 | Discourse parsing and summarization |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US20364300P | 2000-05-11 | 2000-05-11 | |
US09/854,301 US20020046018A1 (en) | 2000-05-11 | 2001-05-11 | Discourse parsing and summarization |
Publications (1)
Publication Number | Publication Date |
---|---|
US20020046018A1 true US20020046018A1 (en) | 2002-04-18 |
Family
ID=22754752
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/854,327 Expired - Lifetime US7533013B2 (en) | 2000-05-11 | 2001-05-11 | Machine translation techniques |
US09/854,301 Abandoned US20020046018A1 (en) | 2000-05-11 | 2001-05-11 | Discourse parsing and summarization |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/854,327 Expired - Lifetime US7533013B2 (en) | 2000-05-11 | 2001-05-11 | Machine translation techniques |
Country Status (7)
Cited By (193)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020138248A1 (en) * | 2001-01-26 | 2002-09-26 | Corston-Oliver Simon H. | Lingustically intelligent text compression |
US20020186241A1 (en) * | 2001-02-15 | 2002-12-12 | Ibm | Digital document browsing system and method thereof |
US20030167252A1 (en) * | 2002-02-26 | 2003-09-04 | Pliant Technologies, Inc. | Topic identification and use thereof in information retrieval systems |
US20030188255A1 (en) * | 2002-03-28 | 2003-10-02 | Fujitsu Limited | Apparatus for and method of generating synchronized contents information, and computer product |
US20040001893A1 (en) * | 2002-02-15 | 2004-01-01 | Stupp Samuel I. | Self-assembly of peptide-amphiphile nanofibers under physiological conditions |
US20040044519A1 (en) * | 2002-08-30 | 2004-03-04 | Livia Polanyi | System and method for summarization combining natural language generation with structural analysis |
WO2004046956A1 (en) * | 2002-11-14 | 2004-06-03 | Educational Testing Service | Automated evaluation of overly repetitive word use in an essay |
US20040117734A1 (en) * | 2002-09-30 | 2004-06-17 | Frank Krickhahn | Method and apparatus for structuring texts |
US20040167885A1 (en) * | 2002-12-06 | 2004-08-26 | Attensity Corporation | Data products of processes of extracting role related information from free text sources |
US20040230415A1 (en) * | 2003-05-12 | 2004-11-18 | Stefan Riezler | Systems and methods for grammatical text condensation |
US20040258726A1 (en) * | 2003-02-11 | 2004-12-23 | Stupp Samuel I. | Methods and materials for nanocrystalline surface coatings and attachment of peptide amphiphile nanofibers thereon |
US20050038643A1 (en) * | 2003-07-02 | 2005-02-17 | Philipp Koehn | Statistical noun phrase translation |
US20050086592A1 (en) * | 2003-10-15 | 2005-04-21 | Livia Polanyi | Systems and methods for hybrid text summarization |
EP1535261A1 (en) * | 2002-06-24 | 2005-06-01 | Educational Testing Service | Automated essay annotation system and method |
US20050138556A1 (en) * | 2003-12-18 | 2005-06-23 | Xerox Corporation | Creation of normalized summaries using common domain models for input text analysis and output text generation |
US20050137855A1 (en) * | 2003-12-19 | 2005-06-23 | Maxwell John T.Iii | Systems and methods for the generation of alternate phrases from packed meaning |
US20050170325A1 (en) * | 2002-02-22 | 2005-08-04 | Steinberg Linda S. | Portal assessment design system for educational testing |
US20050187772A1 (en) * | 2004-02-25 | 2005-08-25 | Fuji Xerox Co., Ltd. | Systems and methods for synthesizing speech using discourse function level prosodic features |
US20050208589A1 (en) * | 2003-12-05 | 2005-09-22 | Stupp Samuel I | Branched peptide amphiphiles, related epitope compounds and self assembled structures thereof |
US20050209145A1 (en) * | 2003-12-05 | 2005-09-22 | Stupp Samuel I | Self-assembling peptide amphiphiles and related methods for growth factor delivery |
US20050221266A1 (en) * | 2004-04-02 | 2005-10-06 | Mislevy Robert J | System and method for assessment design |
US6961692B1 (en) * | 2000-08-01 | 2005-11-01 | Fuji Xerox Co, Ltd. | System and method for writing analysis using the linguistic discourse model |
US20050256848A1 (en) * | 2004-05-13 | 2005-11-17 | International Business Machines Corporation | System and method for user rank search |
US20060004732A1 (en) * | 2002-02-26 | 2006-01-05 | Odom Paul S | Search engine methods and systems for generating relevant search results and advertisements |
US20060010138A1 (en) * | 2004-07-09 | 2006-01-12 | International Business Machines Corporation | Method and system for efficient representation, manipulation, communication, and search of hierarchical composite named entities |
US20060009961A1 (en) * | 2004-06-23 | 2006-01-12 | Ning-Ping Chan | Method of decomposing prose elements in document processing |
US20060020571A1 (en) * | 2004-07-26 | 2006-01-26 | Patterson Anna L | Phrase-based generation of document descriptions |
US20060020473A1 (en) * | 2004-07-26 | 2006-01-26 | Atsuo Hiroe | Method, apparatus, and program for dialogue, and storage medium including a program stored therein |
US20060020607A1 (en) * | 2004-07-26 | 2006-01-26 | Patterson Anna L | Phrase-based indexing in an information retrieval system |
US20060031195A1 (en) * | 2004-07-26 | 2006-02-09 | Patterson Anna L | Phrase-based searching in an information retrieval system |
US20060095250A1 (en) * | 2004-11-03 | 2006-05-04 | Microsoft Corporation | Parser for natural language processing |
US20060116860A1 (en) * | 2004-11-30 | 2006-06-01 | Xerox Corporation | Systems and methods for user-interest sensitive condensation |
US20060142995A1 (en) * | 2004-10-12 | 2006-06-29 | Kevin Knight | Training for a text-to-text application which uses string to tree conversion for training and decoding |
US20060149036A1 (en) * | 2002-11-12 | 2006-07-06 | Stupp Samuel I | Composition and method for self-assembly and mineralizatin of peptide amphiphiles |
US20060155530A1 (en) * | 2004-12-14 | 2006-07-13 | International Business Machines Corporation | Method and apparatus for generation of text documents |
US20060194183A1 (en) * | 2005-02-28 | 2006-08-31 | Yigal Attali | Method of model scaling for an automated essay scoring system |
US20060247165A1 (en) * | 2005-01-21 | 2006-11-02 | Stupp Samuel I | Methods and compositions for encapsulation of cells |
US20060277028A1 (en) * | 2005-06-01 | 2006-12-07 | Microsoft Corporation | Training a statistical parser on noisy data by filtering |
US20060294155A1 (en) * | 2004-07-26 | 2006-12-28 | Patterson Anna L | Detecting spam documents in a phrase based information retrieval system |
US20070192309A1 (en) * | 2005-10-12 | 2007-08-16 | Gordon Fischer | Method and system for identifying sentence boundaries |
US20070240078A1 (en) * | 2004-12-21 | 2007-10-11 | Palo Alto Research Center Incorporated | Systems and methods for using and constructing user-interest sensitive indicators of search results |
US20070260449A1 (en) * | 2006-05-02 | 2007-11-08 | Shimei Pan | Instance-based sentence boundary determination by optimization |
US20070260598A1 (en) * | 2005-11-29 | 2007-11-08 | Odom Paul S | Methods and systems for providing personalized contextual search results |
US20070265996A1 (en) * | 2002-02-26 | 2007-11-15 | Odom Paul S | Search engine methods and systems for displaying relevant topics |
US20070277250A1 (en) * | 2005-03-04 | 2007-11-29 | Stupp Samuel I | Angiogenic heparin-binding epitopes, peptide amphiphiles, self-assembled compositions and related methods of use |
US20080033715A1 (en) * | 2002-01-14 | 2008-02-07 | Microsoft Corporation | System for normalizing a discourse representation structure and normalized data structure |
WO2007117652A3 (en) * | 2006-04-07 | 2008-05-02 | Basis Technology Corp | Method and system of machine translation |
US7426507B1 (en) | 2004-07-26 | 2008-09-16 | Google, Inc. | Automatic taxonomy generation in search results using phrases |
US20080270109A1 (en) * | 2004-04-16 | 2008-10-30 | University Of Southern California | Method and System for Translating Information with a Higher Probability of a Correct Translation |
US20080306943A1 (en) * | 2004-07-26 | 2008-12-11 | Anna Lynn Patterson | Phrase-based detection of duplicate documents in an information retrieval system |
US20080319971A1 (en) * | 2004-07-26 | 2008-12-25 | Anna Lynn Patterson | Phrase-based personalization of searches in an information retrieval system |
US20090006080A1 (en) * | 2007-06-29 | 2009-01-01 | Fujitsu Limited | Computer-readable medium having sentence dividing program stored thereon, sentence dividing apparatus, and sentence dividing method |
US20090042804A1 (en) * | 2007-04-17 | 2009-02-12 | Hulvat James F | Novel peptide amphiphiles having improved solubility and methods of using same |
US7491690B2 (en) | 2001-11-14 | 2009-02-17 | Northwestern University | Self-assembly and mineralization of peptide-amphiphile nanofibers |
US20090045971A1 (en) * | 2006-03-06 | 2009-02-19 | Koninklijke Philips Electronics N.V. | Use of decision trees for automatic commissioning |
US7534761B1 (en) | 2002-08-21 | 2009-05-19 | North Western University | Charged peptide-amphiphile solutions and self-assembled peptide nanofiber networks formed therefrom |
US7567959B2 (en) | 2004-07-26 | 2009-07-28 | Google Inc. | Multiple index based information retrieval system |
US20090306964A1 (en) * | 2008-06-06 | 2009-12-10 | Olivier Bonnet | Data detection |
US20090326927A1 (en) * | 2008-06-27 | 2009-12-31 | Microsoft Corporation | Adaptive generation of out-of-dictionary personalized long words |
US20100042398A1 (en) * | 2002-03-26 | 2010-02-18 | Daniel Marcu | Building A Translation Lexicon From Comparable, Non-Parallel Corpora |
US7683025B2 (en) | 2002-11-14 | 2010-03-23 | Northwestern University | Synthesis and self-assembly of ABC triblock bola peptide amphiphiles |
US7693813B1 (en) | 2007-03-30 | 2010-04-06 | Google Inc. | Index server architecture using tiered and sharded phrase posting lists |
US7702614B1 (en) | 2007-03-30 | 2010-04-20 | Google Inc. | Index updating using segment swapping |
US7702618B1 (en) | 2004-07-26 | 2010-04-20 | Google Inc. | Information retrieval system for archiving multiple document versions |
US20100131274A1 (en) * | 2008-11-26 | 2010-05-27 | At&T Intellectual Property I, L.P. | System and method for dialog modeling |
US20100169359A1 (en) * | 2008-12-30 | 2010-07-01 | Barrett Leslie A | System, Method, and Apparatus for Information Extraction of Textual Documents |
US20100174524A1 (en) * | 2004-07-02 | 2010-07-08 | Philipp Koehn | Empirical Methods for Splitting Compound Words with Application to Machine Translation |
US20100266557A1 (en) * | 2009-04-13 | 2010-10-21 | Northwestern University | Novel peptide-based scaffolds for cartilage regeneration and methods for their use |
US7827029B2 (en) * | 2004-11-30 | 2010-11-02 | Palo Alto Research Center Incorporated | Systems and methods for user-interest sensitive note-taking |
US20100318348A1 (en) * | 2002-05-20 | 2010-12-16 | Microsoft Corporation | Applying a structured language model to information extraction |
US7925496B1 (en) * | 2007-04-23 | 2011-04-12 | The United States Of America As Represented By The Secretary Of The Navy | Method for summarizing natural language text |
US7925655B1 (en) | 2007-03-30 | 2011-04-12 | Google Inc. | Query scheduling using hierarchical tiers of index servers |
US20110225104A1 (en) * | 2010-03-09 | 2011-09-15 | Radu Soricut | Predicting the Cost Associated with Translating Textual Content |
US20110282651A1 (en) * | 2010-05-11 | 2011-11-17 | Microsoft Corporation | Generating snippets based on content features |
US8086594B1 (en) | 2007-03-30 | 2011-12-27 | Google Inc. | Bifurcated document relevance scoring |
US20120035912A1 (en) * | 2010-07-30 | 2012-02-09 | Ben-Gurion University Of The Negev Research And Development Authority | Multilingual sentence extractor |
US8117223B2 (en) | 2007-09-07 | 2012-02-14 | Google Inc. | Integrating external related phrase information into a phrase-based indexing information retrieval system |
US20120065960A1 (en) * | 2010-09-14 | 2012-03-15 | International Business Machines Corporation | Generating parser combination by combining language processing parsers |
US8166045B1 (en) | 2007-03-30 | 2012-04-24 | Google Inc. | Phrase extraction using subphrase scoring |
US8166021B1 (en) | 2007-03-30 | 2012-04-24 | Google Inc. | Query phrasification |
US20120109945A1 (en) * | 2010-10-29 | 2012-05-03 | Emilia Maria Lapko | Method and system of improving navigation within a set of electronic documents |
US20120143595A1 (en) * | 2010-12-06 | 2012-06-07 | Xin Li | Fast title/summary extraction from long descriptions |
US8214196B2 (en) | 2001-07-03 | 2012-07-03 | University Of Southern California | Syntax-based statistical translation model |
US8296127B2 (en) | 2004-03-23 | 2012-10-23 | University Of Southern California | Discovery of parallel text portions in comparable collections of corpora and training using comparable texts |
US8380486B2 (en) | 2009-10-01 | 2013-02-19 | Language Weaver, Inc. | Providing machine-generated translations and corresponding trust levels |
US8433556B2 (en) | 2006-11-02 | 2013-04-30 | University Of Southern California | Semi-supervised training for statistical word alignment |
US8468149B1 (en) | 2007-01-26 | 2013-06-18 | Language Weaver, Inc. | Multi-lingual online community |
US8615389B1 (en) | 2007-03-16 | 2013-12-24 | Language Weaver, Inc. | Generation and exploitation of an approximate language model |
US20140067379A1 (en) * | 2011-11-29 | 2014-03-06 | Sk Telecom Co., Ltd. | Automatic sentence evaluation device using shallow parser to automatically evaluate sentence, and error detection apparatus and method of the same |
US8676563B2 (en) | 2009-10-01 | 2014-03-18 | Language Weaver, Inc. | Providing human-generated and machine-generated trusted translations |
US8694303B2 (en) | 2011-06-15 | 2014-04-08 | Language Weaver, Inc. | Systems and methods for tuning parameters in statistical machine translation |
US8805677B1 (en) * | 2010-02-10 | 2014-08-12 | West Corporation | Processing natural language grammar |
US8825466B1 (en) | 2007-06-08 | 2014-09-02 | Language Weaver, Inc. | Modification of annotated bilingual segment pairs in syntax-based machine translation |
US8831928B2 (en) | 2007-04-04 | 2014-09-09 | Language Weaver, Inc. | Customizable machine translation service |
US8886517B2 (en) | 2005-06-17 | 2014-11-11 | Language Weaver, Inc. | Trust scoring for language translation systems |
US8886515B2 (en) | 2011-10-19 | 2014-11-11 | Language Weaver, Inc. | Systems and methods for enhancing machine translation post edit review processes |
US8886518B1 (en) | 2006-08-07 | 2014-11-11 | Language Weaver, Inc. | System and method for capitalizing machine translated text |
US8914279B1 (en) * | 2011-09-23 | 2014-12-16 | Google Inc. | Efficient parsing with structured prediction cascades |
US8942973B2 (en) | 2012-03-09 | 2015-01-27 | Language Weaver, Inc. | Content page URL translation |
US8943080B2 (en) | 2006-04-07 | 2015-01-27 | University Of Southern California | Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections |
US8990064B2 (en) | 2009-07-28 | 2015-03-24 | Language Weaver, Inc. | Translating documents based on content |
WO2015003143A3 (en) * | 2013-07-03 | 2015-05-14 | Thomson Reuters Global Resources | Method and system for simplifying implicit rhetorical relation prediction in large scale annotated corpus |
US20150205786A1 (en) * | 2012-07-31 | 2015-07-23 | Nec Corporation | Problem situation detection device, problem situation detection method and problem situation detection-use program |
US9122674B1 (en) | 2006-12-15 | 2015-09-01 | Language Weaver, Inc. | Use of annotations in statistical machine translation |
US9152622B2 (en) | 2012-11-26 | 2015-10-06 | Language Weaver, Inc. | Personalized machine translation via online adaptation |
US9213694B2 (en) | 2013-10-10 | 2015-12-15 | Language Weaver, Inc. | Efficient online domain adaptation |
WO2015191061A1 (en) * | 2014-06-11 | 2015-12-17 | Hewlett-Packard Development Company, L.P. | Functional summarization of non-textual content based on a meta-algorithmic pattern |
US9336186B1 (en) * | 2013-10-10 | 2016-05-10 | Google Inc. | Methods and apparatus related to sentence compression |
US9336185B1 (en) * | 2012-09-18 | 2016-05-10 | Amazon Technologies, Inc. | Generating an electronic publication sample |
US9355372B2 (en) | 2013-07-03 | 2016-05-31 | Thomson Reuters Global Resources | Method and system for simplifying implicit rhetorical relation prediction in large scale annotated corpus |
US20160283588A1 (en) * | 2015-03-27 | 2016-09-29 | Fujitsu Limited | Generation apparatus and method |
JP2016186772A (ja) * | 2015-03-27 | 2016-10-27 | 富士通株式会社 | 短縮文生成装置、方法、及びプログラム |
US9483568B1 (en) | 2013-06-05 | 2016-11-01 | Google Inc. | Indexing system |
US9501506B1 (en) | 2013-03-15 | 2016-11-22 | Google Inc. | Indexing system |
US9582501B1 (en) * | 2014-06-16 | 2017-02-28 | Yseop Sa | Techniques for automatic generation of natural language text |
US20170132529A1 (en) * | 2000-09-28 | 2017-05-11 | Intel Corporation | Method and Apparatus for Extracting Entity Names and Their Relations |
US20180011833A1 (en) * | 2015-02-02 | 2018-01-11 | National Institute Of Information And Communications Technology | Syntax analyzing device, learning device, machine translation device and storage medium |
US9916306B2 (en) | 2012-10-19 | 2018-03-13 | Sdl Inc. | Statistical linguistic analysis of source content |
US9954794B2 (en) | 2001-01-18 | 2018-04-24 | Sdl Inc. | Globalization management system and method therefor |
US9984054B2 (en) | 2011-08-24 | 2018-05-29 | Sdl Inc. | Web interface including the review and manipulation of a web document and utilizing permission based control |
US10061749B2 (en) | 2011-01-29 | 2018-08-28 | Sdl Netherlands B.V. | Systems and methods for contextual vocabularies and customer segmentation |
US20180329879A1 (en) * | 2017-05-10 | 2018-11-15 | Oracle International Corporation | Enabling rhetorical analysis via the use of communicative discourse trees |
US10140320B2 (en) | 2011-02-28 | 2018-11-27 | Sdl Inc. | Systems, methods, and media for generating analytical data |
US20180365228A1 (en) * | 2017-06-15 | 2018-12-20 | Oracle International Corporation | Tree kernel learning for text classification into classes of intent |
US10198438B2 (en) | 1999-09-17 | 2019-02-05 | Sdl Inc. | E-services translation utilizing machine translation and translation memory |
US10248650B2 (en) | 2004-03-05 | 2019-04-02 | Sdl Inc. | In-context exact (ICE) matching |
WO2019067869A1 (en) * | 2017-09-28 | 2019-04-04 | Oracle International Corporation | DETERMINING RHETORIC RELATIONSHIPS BETWEEN DOCUMENTS BASED ON THE ANALYSIS AND IDENTIFICATION OF NAMED ENTITIES |
US10261994B2 (en) | 2012-05-25 | 2019-04-16 | Sdl Inc. | Method and system for automatic management of reputation of translators |
US20190138595A1 (en) * | 2017-05-10 | 2019-05-09 | Oracle International Corporation | Enabling chatbots by detecting and supporting affective argumentation |
US10319252B2 (en) * | 2005-11-09 | 2019-06-11 | Sdl Inc. | Language capability assessment and training apparatus and techniques |
US20190272323A1 (en) * | 2017-05-10 | 2019-09-05 | Oracle International Corporation | Enabling chatbots by validating argumentation |
US10452740B2 (en) | 2012-09-14 | 2019-10-22 | Sdl Netherlands B.V. | External content libraries |
WO2019217722A1 (en) * | 2018-05-09 | 2019-11-14 | Oracle International Corporation | Constructing imaginary discourse trees to improve answering convergent questions |
US10572928B2 (en) | 2012-05-11 | 2020-02-25 | Fredhopper B.V. | Method and system for recommending products based on a ranking cocktail |
US10580015B2 (en) | 2011-02-25 | 2020-03-03 | Sdl Netherlands B.V. | Systems, methods, and media for executing and optimizing online marketing initiatives |
US10599885B2 (en) * | 2017-05-10 | 2020-03-24 | Oracle International Corporation | Utilizing discourse structure of noisy user-generated content for chatbot learning |
US10614167B2 (en) | 2015-10-30 | 2020-04-07 | Sdl Plc | Translation review workflow systems and methods |
US20200117709A1 (en) * | 2018-10-16 | 2020-04-16 | Oracle International Corporation | Constructing conclusive answers for autonomous agents |
US10635863B2 (en) | 2017-10-30 | 2020-04-28 | Sdl Inc. | Fragment recall and adaptive automated translation |
US10657540B2 (en) | 2011-01-29 | 2020-05-19 | Sdl Netherlands B.V. | Systems, methods, and media for web content management |
US10679011B2 (en) * | 2017-05-10 | 2020-06-09 | Oracle International Corporation | Enabling chatbots by detecting and supporting argumentation |
US10685189B2 (en) * | 2016-11-17 | 2020-06-16 | Goldman Sachs & Co. LLC | System and method for coupled detection of syntax and semantics for natural language understanding and generation |
US10706236B1 (en) * | 2018-06-28 | 2020-07-07 | Narrative Science Inc. | Applied artificial intelligence technology for using natural language processing and concept expression templates to train a natural language generation system |
US10719542B1 (en) | 2017-02-17 | 2020-07-21 | Narrative Science Inc. | Applied artificial intelligence technology for ontology building to support natural language generation (NLG) using composable communication goals |
US10755042B2 (en) | 2011-01-07 | 2020-08-25 | Narrative Science Inc. | Automatic generation of narratives from data using communication goals and narrative analytics |
US10796099B2 (en) | 2017-09-28 | 2020-10-06 | Oracle International Corporation | Enabling autonomous agents to discriminate between questions and requests |
US10817676B2 (en) | 2017-12-27 | 2020-10-27 | Sdl Inc. | Intelligent routing services and systems |
US10853583B1 (en) | 2016-08-31 | 2020-12-01 | Narrative Science Inc. | Applied artificial intelligence technology for selective control over narrative generation from visualizations of data |
US10943069B1 (en) | 2017-02-17 | 2021-03-09 | Narrative Science Inc. | Applied artificial intelligence technology for narrative generation based on a conditional outcome framework |
US10949623B2 (en) | 2018-01-30 | 2021-03-16 | Oracle International Corporation | Using communicative discourse trees to detect a request for an explanation |
US10963649B1 (en) | 2018-01-17 | 2021-03-30 | Narrative Science Inc. | Applied artificial intelligence technology for narrative generation using an invocable analysis service and configuration-driven analytics |
US10990767B1 (en) | 2019-01-28 | 2021-04-27 | Narrative Science Inc. | Applied artificial intelligence technology for adaptive natural language understanding |
US11003838B2 (en) | 2011-04-18 | 2021-05-11 | Sdl Inc. | Systems and methods for monitoring post translation editing |
US11042709B1 (en) | 2018-01-02 | 2021-06-22 | Narrative Science Inc. | Context saliency-based deictic parser for natural language processing |
US11068661B1 (en) | 2017-02-17 | 2021-07-20 | Narrative Science Inc. | Applied artificial intelligence technology for narrative generation based on smart attributes |
US11074402B1 (en) * | 2020-04-07 | 2021-07-27 | International Business Machines Corporation | Linguistically consistent document annotation |
US11100144B2 (en) | 2017-06-15 | 2021-08-24 | Oracle International Corporation | Data loss prevention system for cloud security based on document discourse analysis |
US11170038B1 (en) | 2015-11-02 | 2021-11-09 | Narrative Science Inc. | Applied artificial intelligence technology for using narrative analytics to automatically generate narratives from multiple visualizations |
US11182412B2 (en) * | 2017-09-27 | 2021-11-23 | Oracle International Corporation | Search indexing using discourse trees |
US11222184B1 (en) | 2015-11-02 | 2022-01-11 | Narrative Science Inc. | Applied artificial intelligence technology for using narrative analytics to automatically generate narratives from bar charts |
US11232268B1 (en) | 2015-11-02 | 2022-01-25 | Narrative Science Inc. | Applied artificial intelligence technology for using narrative analytics to automatically generate narratives from line charts |
US11238090B1 (en) | 2015-11-02 | 2022-02-01 | Narrative Science Inc. | Applied artificial intelligence technology for using narrative analytics to automatically generate narratives from visualization data |
US11256867B2 (en) | 2018-10-09 | 2022-02-22 | Sdl Inc. | Systems and methods of machine learning for digital assets and message creation |
US11263394B2 (en) * | 2019-08-02 | 2022-03-01 | Adobe Inc. | Low-resource sentence compression system |
US11275892B2 (en) | 2019-04-29 | 2022-03-15 | International Business Machines Corporation | Traversal-based sentence span judgements |
US11308528B2 (en) | 2012-09-14 | 2022-04-19 | Sdl Netherlands B.V. | Blueprinting of multimedia assets |
US11321536B2 (en) * | 2019-02-13 | 2022-05-03 | Oracle International Corporation | Chatbot conducting a virtual social dialogue |
US11334720B2 (en) | 2019-04-17 | 2022-05-17 | International Business Machines Corporation | Machine learned sentence span inclusion judgments |
US11373632B2 (en) * | 2017-05-10 | 2022-06-28 | Oracle International Corporation | Using communicative discourse trees to create a virtual persuasive dialogue |
US11386186B2 (en) | 2012-09-14 | 2022-07-12 | Sdl Netherlands B.V. | External content library connector systems and methods |
US11386274B2 (en) * | 2017-05-10 | 2022-07-12 | Oracle International Corporation | Using communicative discourse trees to detect distributed incompetence |
US20220284194A1 (en) * | 2017-05-10 | 2022-09-08 | Oracle International Corporation | Using communicative discourse trees to detect distributed incompetence |
US11449682B2 (en) | 2019-08-29 | 2022-09-20 | Oracle International Corporation | Adjusting chatbot conversation to user personality and mood |
US11455494B2 (en) | 2018-05-30 | 2022-09-27 | Oracle International Corporation | Automated building of expanded datasets for training of autonomous agents |
US20220318513A9 (en) * | 2017-05-10 | 2022-10-06 | Oracle International Corporation | Discourse parsing using semantic and syntactic relations |
US11475210B2 (en) * | 2020-08-31 | 2022-10-18 | Twilio Inc. | Language model for abstractive summarization |
US11501085B2 (en) | 2019-11-20 | 2022-11-15 | Oracle International Corporation | Employing abstract meaning representation to lay the last mile towards reading comprehension |
US11537645B2 (en) * | 2018-01-30 | 2022-12-27 | Oracle International Corporation | Building dialogue structure by using communicative discourse trees |
US11551008B2 (en) | 2019-04-28 | 2023-01-10 | Beijing Xiaomi Intelligent Technology Co., Ltd. | Method and device for text processing |
US11556698B2 (en) * | 2019-10-22 | 2023-01-17 | Oracle International Corporation | Augmenting textual explanations with complete discourse trees |
US11561684B1 (en) | 2013-03-15 | 2023-01-24 | Narrative Science Inc. | Method and system for configuring automatic generation of narratives from data |
US11568148B1 (en) | 2017-02-17 | 2023-01-31 | Narrative Science Inc. | Applied artificial intelligence technology for narrative generation based on explanation communication goals |
US11580298B2 (en) | 2019-11-14 | 2023-02-14 | Oracle International Corporation | Detecting hypocrisy in text |
US11586827B2 (en) * | 2017-05-10 | 2023-02-21 | Oracle International Corporation | Generating desired discourse structure from an arbitrary text |
US11615145B2 (en) | 2017-05-10 | 2023-03-28 | Oracle International Corporation | Converting a document into a chatbot-accessible form via the use of communicative discourse trees |
US11645459B2 (en) * | 2018-07-02 | 2023-05-09 | Oracle International Corporation | Social autonomous agent implementation using lattice queries and relevancy detection |
US11765267B2 (en) | 2020-12-31 | 2023-09-19 | Twilio Inc. | Tool for annotating and reviewing audio conversations |
US11775772B2 (en) | 2019-12-05 | 2023-10-03 | Oracle International Corporation | Chatbot providing a defeating reply |
US11809825B2 (en) | 2017-09-28 | 2023-11-07 | Oracle International Corporation | Management of a focused information sharing dialogue based on discourse trees |
US11809804B2 (en) | 2021-05-26 | 2023-11-07 | Twilio Inc. | Text formatter |
US11847420B2 (en) | 2020-03-05 | 2023-12-19 | Oracle International Corporation | Conversational explainability |
US11922344B2 (en) | 2014-10-22 | 2024-03-05 | Narrative Science Llc | Automatic generation of narratives from data using communication goals and narrative analytics |
US11954445B2 (en) | 2017-02-17 | 2024-04-09 | Narrative Science Llc | Applied artificial intelligence technology for narrative generation based on explanation communication goals |
Families Citing this family (108)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7054803B2 (en) * | 2000-12-19 | 2006-05-30 | Xerox Corporation | Extracting sentence translations from translated documents |
US6990439B2 (en) * | 2001-01-10 | 2006-01-24 | Microsoft Corporation | Method and apparatus for performing machine translation using a unified language model and translation model |
US7734459B2 (en) * | 2001-06-01 | 2010-06-08 | Microsoft Corporation | Automatic extraction of transfer mappings from bilingual corpora |
US7146358B1 (en) | 2001-08-28 | 2006-12-05 | Google Inc. | Systems and methods for using anchor text as parallel corpora for cross-language information retrieval |
CN1578954B (zh) * | 2001-10-29 | 2010-04-14 | 英国电讯有限公司 | 计算机语言翻译扩展系统 |
JP3959453B2 (ja) * | 2002-03-14 | 2007-08-15 | 沖電気工業株式会社 | 翻訳仲介システム及び翻訳仲介サーバ |
US7634398B2 (en) * | 2002-05-16 | 2009-12-15 | Microsoft Corporation | Method and apparatus for reattaching nodes in a parse structure |
JP2005100335A (ja) * | 2003-09-01 | 2005-04-14 | Advanced Telecommunication Research Institute International | 機械翻訳装置、機械翻訳コンピュータプログラム及びコンピュータ |
JP3919771B2 (ja) * | 2003-09-09 | 2007-05-30 | 株式会社国際電気通信基礎技術研究所 | 機械翻訳システム、その制御装置、及びコンピュータプログラム |
US8037102B2 (en) | 2004-02-09 | 2011-10-11 | Robert T. and Virginia T. Jenkins | Manipulating sets of hierarchical data |
US9646107B2 (en) | 2004-05-28 | 2017-05-09 | Robert T. and Virginia T. Jenkins as Trustee of the Jenkins Family Trust | Method and/or system for simplifying tree expressions such as for query reduction |
US7882147B2 (en) * | 2004-06-30 | 2011-02-01 | Robert T. and Virginia T. Jenkins | File location naming hierarchy |
US7620632B2 (en) * | 2004-06-30 | 2009-11-17 | Skyler Technology, Inc. | Method and/or system for performing tree matching |
US7801923B2 (en) | 2004-10-29 | 2010-09-21 | Robert T. and Virginia T. Jenkins as Trustees of the Jenkins Family Trust | Method and/or system for tagging trees |
US7627591B2 (en) | 2004-10-29 | 2009-12-01 | Skyler Technology, Inc. | Method and/or system for manipulating tree expressions |
US7630995B2 (en) | 2004-11-30 | 2009-12-08 | Skyler Technology, Inc. | Method and/or system for transmitting and/or receiving data |
US7636727B2 (en) | 2004-12-06 | 2009-12-22 | Skyler Technology, Inc. | Enumeration of trees from finite number of nodes |
US8316059B1 (en) | 2004-12-30 | 2012-11-20 | Robert T. and Virginia T. Jenkins | Enumeration of rooted partial subtrees |
JP4301515B2 (ja) * | 2005-01-04 | 2009-07-22 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 文章表示方法、情報処理装置、情報処理システム、プログラム |
US8615530B1 (en) | 2005-01-31 | 2013-12-24 | Robert T. and Virginia T. Jenkins as Trustees for the Jenkins Family Trust | Method and/or system for tree transformation |
US7681177B2 (en) | 2005-02-28 | 2010-03-16 | Skyler Technology, Inc. | Method and/or system for transforming between trees and strings |
JP4050755B2 (ja) * | 2005-03-30 | 2008-02-20 | 株式会社東芝 | コミュニケーション支援装置、コミュニケーション支援方法およびコミュニケーション支援プログラム |
US8356040B2 (en) * | 2005-03-31 | 2013-01-15 | Robert T. and Virginia T. Jenkins | Method and/or system for transforming between trees and arrays |
US7899821B1 (en) | 2005-04-29 | 2011-03-01 | Karl Schiffmann | Manipulation and/or analysis of hierarchical data |
EP1894125A4 (en) * | 2005-06-17 | 2015-12-02 | Nat Res Council Canada | MEANS AND METHOD FOR ADAPTED LANGUAGE TRANSLATION |
US20070010989A1 (en) * | 2005-07-07 | 2007-01-11 | International Business Machines Corporation | Decoding procedure for statistical machine translation |
US7779396B2 (en) * | 2005-08-10 | 2010-08-17 | Microsoft Corporation | Syntactic program language translation |
US8924212B1 (en) * | 2005-08-26 | 2014-12-30 | At&T Intellectual Property Ii, L.P. | System and method for robust access and entry to large structured data using voice form-filling |
EP2511833B1 (en) * | 2006-02-17 | 2020-02-05 | Google LLC | Encoding and adaptive, scalable accessing of distributed translation models |
US9645993B2 (en) | 2006-10-10 | 2017-05-09 | Abbyy Infopoisk Llc | Method and system for semantic searching |
US20080086298A1 (en) * | 2006-10-10 | 2008-04-10 | Anisimovich Konstantin | Method and system for translating sentences between langauges |
US8548795B2 (en) * | 2006-10-10 | 2013-10-01 | Abbyy Software Ltd. | Method for translating documents from one language into another using a database of translations, a terminology dictionary, a translation dictionary, and a machine translation system |
US9633005B2 (en) | 2006-10-10 | 2017-04-25 | Abbyy Infopoisk Llc | Exhaustive automatic processing of textual information |
US8195447B2 (en) | 2006-10-10 | 2012-06-05 | Abbyy Software Ltd. | Translating sentences between languages using language-independent semantic structures and ratings of syntactic constructions |
US9984071B2 (en) | 2006-10-10 | 2018-05-29 | Abbyy Production Llc | Language ambiguity detection of text |
US9235573B2 (en) | 2006-10-10 | 2016-01-12 | Abbyy Infopoisk Llc | Universal difference measure |
US9047275B2 (en) | 2006-10-10 | 2015-06-02 | Abbyy Infopoisk Llc | Methods and systems for alignment of parallel text corpora |
US8145473B2 (en) | 2006-10-10 | 2012-03-27 | Abbyy Software Ltd. | Deep model statistics method for machine translation |
US8214199B2 (en) * | 2006-10-10 | 2012-07-03 | Abbyy Software, Ltd. | Systems for translating sentences between languages using language-independent semantic structures and ratings of syntactic constructions |
JP5082374B2 (ja) * | 2006-10-19 | 2012-11-28 | 富士通株式会社 | フレーズアラインメントプログラム、翻訳プログラム、フレーズアラインメント装置およびフレーズアラインメント方法 |
JP4997966B2 (ja) * | 2006-12-28 | 2012-08-15 | 富士通株式会社 | 対訳例文検索プログラム、対訳例文検索装置、および対訳例文検索方法 |
US7895030B2 (en) * | 2007-03-16 | 2011-02-22 | International Business Machines Corporation | Visualization method for machine translation |
US8959011B2 (en) | 2007-03-22 | 2015-02-17 | Abbyy Infopoisk Llc | Indicating and correcting errors in machine translation systems |
US7908552B2 (en) * | 2007-04-13 | 2011-03-15 | A-Life Medical Inc. | Mere-parsing with boundary and semantic driven scoping |
US7877251B2 (en) * | 2007-05-07 | 2011-01-25 | Microsoft Corporation | Document translation system |
US9779079B2 (en) * | 2007-06-01 | 2017-10-03 | Xerox Corporation | Authoring system |
US8452585B2 (en) * | 2007-06-21 | 2013-05-28 | Microsoft Corporation | Discriminative syntactic word order model for machine translation |
US8812296B2 (en) | 2007-06-27 | 2014-08-19 | Abbyy Infopoisk Llc | Method and system for natural language dictionary generation |
US8103498B2 (en) * | 2007-08-10 | 2012-01-24 | Microsoft Corporation | Progressive display rendering of processed text |
US8229728B2 (en) * | 2008-01-04 | 2012-07-24 | Fluential, Llc | Methods for using manual phrase alignment data to generate translation models for statistical machine translation |
US20120284015A1 (en) * | 2008-01-28 | 2012-11-08 | William Drewes | Method for Increasing the Accuracy of Subject-Specific Statistical Machine Translation (SMT) |
US9262409B2 (en) | 2008-08-06 | 2016-02-16 | Abbyy Infopoisk Llc | Translation of a selected text fragment of a screen |
US8244519B2 (en) * | 2008-12-03 | 2012-08-14 | Xerox Corporation | Dynamic translation memory using statistical machine translation |
CN101996166B (zh) * | 2009-08-14 | 2015-08-05 | 张龙哺 | 双语句对模式化记录方法以及翻译方法和翻译系统 |
US9710429B1 (en) * | 2010-11-12 | 2017-07-18 | Google Inc. | Providing text resources updated with translation input from multiple users |
US20120143593A1 (en) * | 2010-12-07 | 2012-06-07 | Microsoft Corporation | Fuzzy matching and scoring based on direct alignment |
US8903707B2 (en) | 2012-01-12 | 2014-12-02 | International Business Machines Corporation | Predicting pronouns of dropped pronoun style languages for natural language translation |
US20150161109A1 (en) * | 2012-01-13 | 2015-06-11 | Google Inc. | Reordering words for machine translation |
CN102662935A (zh) * | 2012-04-08 | 2012-09-12 | 北京语智云帆科技有限公司 | 一种交互式的机器翻译方法和机器翻译系统 |
US8971630B2 (en) | 2012-04-27 | 2015-03-03 | Abbyy Development Llc | Fast CJK character recognition |
US8989485B2 (en) | 2012-04-27 | 2015-03-24 | Abbyy Development Llc | Detecting a junction in a text line of CJK characters |
CN102999486B (zh) * | 2012-11-16 | 2016-12-21 | 沈阳雅译网络技术有限公司 | 基于组合的短语规则抽取方法 |
CN105808076A (zh) * | 2012-12-14 | 2016-07-27 | 中兴通讯股份有限公司 | 一种浏览器书签的设置方法、装置及终端 |
US9600473B2 (en) | 2013-02-08 | 2017-03-21 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US9298703B2 (en) | 2013-02-08 | 2016-03-29 | Machine Zone, Inc. | Systems and methods for incentivizing user feedback for translation processing |
US10650103B2 (en) | 2013-02-08 | 2020-05-12 | Mz Ip Holdings, Llc | Systems and methods for incentivizing user feedback for translation processing |
US9031829B2 (en) | 2013-02-08 | 2015-05-12 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US9231898B2 (en) | 2013-02-08 | 2016-01-05 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US8996352B2 (en) | 2013-02-08 | 2015-03-31 | Machine Zone, Inc. | Systems and methods for correcting translations in multi-user multi-lingual communications |
JP6058513B2 (ja) * | 2013-10-01 | 2017-01-11 | 日本電信電話株式会社 | 語順並び替え装置、翻訳装置、方法、及びプログラム |
JP6226321B2 (ja) * | 2013-10-23 | 2017-11-08 | 株式会社サン・フレア | 翻訳支援システム、翻訳支援システムのサーバー、翻訳支援システムのクライアント、翻訳支援システムの制御方法、及びそのプログラム |
KR102256291B1 (ko) * | 2013-11-15 | 2021-05-27 | 삼성전자 주식회사 | 번역 상황을 인지하고 번역 기능을 수행하는 방법 및 이를 구현하는 전자장치 |
RU2592395C2 (ru) | 2013-12-19 | 2016-07-20 | Общество с ограниченной ответственностью "Аби ИнфоПоиск" | Разрешение семантической неоднозначности при помощи статистического анализа |
CN103645931B (zh) | 2013-12-25 | 2016-06-22 | 盛杰 | 代码转换的方法及装置 |
JP6323828B2 (ja) * | 2013-12-27 | 2018-05-16 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | 支援装置、情報処理方法、及び、プログラム |
RU2586577C2 (ru) | 2014-01-15 | 2016-06-10 | Общество с ограниченной ответственностью "Аби ИнфоПоиск" | Фильтрация дуг в синтаксическом графе |
US9524293B2 (en) * | 2014-08-15 | 2016-12-20 | Google Inc. | Techniques for automatically swapping languages and/or content for machine translation |
RU2596600C2 (ru) | 2014-09-02 | 2016-09-10 | Общество с ограниченной ответственностью "Аби Девелопмент" | Способы и системы обработки изображений математических выражений |
US10162811B2 (en) | 2014-10-17 | 2018-12-25 | Mz Ip Holdings, Llc | Systems and methods for language detection |
US9372848B2 (en) | 2014-10-17 | 2016-06-21 | Machine Zone, Inc. | Systems and methods for language detection |
US9626358B2 (en) | 2014-11-26 | 2017-04-18 | Abbyy Infopoisk Llc | Creating ontologies by analyzing natural language texts |
US10333696B2 (en) | 2015-01-12 | 2019-06-25 | X-Prime, Inc. | Systems and methods for implementing an efficient, scalable homomorphic transformation of encrypted data with minimal data expansion and improved processing efficiency |
CN105117389B (zh) * | 2015-07-28 | 2018-01-19 | 百度在线网络技术(北京)有限公司 | 翻译方法和装置 |
CN106484681B (zh) | 2015-08-25 | 2019-07-09 | 阿里巴巴集团控股有限公司 | 一种生成候选译文的方法、装置及电子设备 |
CN106484682B (zh) | 2015-08-25 | 2019-06-25 | 阿里巴巴集团控股有限公司 | 基于统计的机器翻译方法、装置及电子设备 |
US10586168B2 (en) | 2015-10-08 | 2020-03-10 | Facebook, Inc. | Deep translations |
US9990361B2 (en) * | 2015-10-08 | 2018-06-05 | Facebook, Inc. | Language independent representations |
US10765956B2 (en) | 2016-01-07 | 2020-09-08 | Machine Zone Inc. | Named entity recognition on chat data |
CN106021239B (zh) * | 2016-04-29 | 2018-10-26 | 北京创鑫旅程网络技术有限公司 | 一种翻译质量实时评价方法 |
US10346548B1 (en) * | 2016-09-26 | 2019-07-09 | Lilt, Inc. | Apparatus and method for prefix-constrained decoding in a neural machine translation system |
US10261995B1 (en) | 2016-09-28 | 2019-04-16 | Amazon Technologies, Inc. | Semantic and natural language processing for content categorization and routing |
US10235362B1 (en) * | 2016-09-28 | 2019-03-19 | Amazon Technologies, Inc. | Continuous translation refinement with automated delivery of re-translated content |
US10275459B1 (en) | 2016-09-28 | 2019-04-30 | Amazon Technologies, Inc. | Source language content scoring for localizability |
KR102130429B1 (ko) * | 2016-11-07 | 2020-07-07 | 한화테크윈 주식회사 | 멀티미디어 수신 장치에서 디코딩을 수행하는 방법 및 멀티미디어 장치 |
US10417350B1 (en) | 2017-08-28 | 2019-09-17 | Amazon Technologies, Inc. | Artificial intelligence system for automated adaptation of text-based classification models for multiple languages |
WO2019060353A1 (en) | 2017-09-21 | 2019-03-28 | Mz Ip Holdings, Llc | SYSTEM AND METHOD FOR TRANSLATION OF KEYBOARD MESSAGES |
WO2019107623A1 (ko) * | 2017-11-30 | 2019-06-06 | 주식회사 시스트란인터내셔널 | 기계 번역 방법 및 이를 위한 장치 |
EP3769238A4 (en) * | 2018-03-19 | 2022-01-26 | Coffing, Daniel L. | PROCESSING OF ARGUMENTS AND PROPOSITIONS IN NATURAL LANGUAGE |
EP3847643A4 (en) | 2018-09-06 | 2022-04-20 | Coffing, Daniel L. | DIALOG GUIDANCE PROVIDING SYSTEM |
EP3850781A4 (en) | 2018-09-14 | 2022-05-04 | Coffing, Daniel L. | FACT MANAGEMENT SYSTEM |
CN109710952B (zh) * | 2018-12-27 | 2023-06-16 | 北京百度网讯科技有限公司 | 基于人工智能的翻译历史检索方法、装置、设备和介质 |
US11599731B2 (en) * | 2019-10-02 | 2023-03-07 | Oracle International Corporation | Generating recommendations by using communicative discourse trees of conversations |
CN111104807A (zh) * | 2019-12-06 | 2020-05-05 | 北京搜狗科技发展有限公司 | 一种数据处理方法、装置和电子设备 |
US11822892B2 (en) * | 2020-12-16 | 2023-11-21 | International Business Machines Corporation | Automated natural language splitting for generation of knowledge graphs |
KR102562920B1 (ko) * | 2020-12-29 | 2023-08-02 | 엑스엘에이트 아이앤씨 | 기계번역을 위한 장치 및 방법 |
CN112784612B (zh) * | 2021-01-26 | 2023-12-22 | 浙江香侬慧语科技有限责任公司 | 基于迭代修改的同步机器翻译的方法、装置、介质及设备 |
CN113705158A (zh) * | 2021-09-26 | 2021-11-26 | 上海一者信息科技有限公司 | 一种文档翻译中智能还原原文样式的方法 |
CN115795039B (zh) * | 2023-02-08 | 2023-06-02 | 成都索贝数码科技股份有限公司 | 基于深度学习的风格标题生成方法、设备及介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5477451A (en) * | 1991-07-25 | 1995-12-19 | International Business Machines Corp. | Method and system for natural language translation |
US5642520A (en) * | 1993-12-07 | 1997-06-24 | Nippon Telegraph And Telephone Corporation | Method and apparatus for recognizing topic structure of language data |
US5761631A (en) * | 1994-11-17 | 1998-06-02 | International Business Machines Corporation | Parsing method and system for natural language processing |
US5903858A (en) * | 1995-06-23 | 1999-05-11 | Saraki; Masashi | Translation machine for editing a original text by rewriting the same and translating the rewrote one |
US5991710A (en) * | 1997-05-20 | 1999-11-23 | International Business Machines Corporation | Statistical translation system with features based on phrases or groups of words |
US6112168A (en) * | 1997-10-20 | 2000-08-29 | Microsoft Corporation | Automatically recognizing the discourse structure of a body of text |
US6205456B1 (en) * | 1997-01-17 | 2001-03-20 | Fujitsu Limited | Summarization apparatus and method |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS61217871A (ja) * | 1985-03-25 | 1986-09-27 | Toshiba Corp | 翻訳処理装置 |
DE3616751A1 (de) * | 1985-05-20 | 1986-11-20 | Sharp K.K., Osaka | Uebersetzungssystem |
JPH02301869A (ja) * | 1989-05-17 | 1990-12-13 | Hitachi Ltd | 自然言語処理システム保守支援方式 |
US5369574A (en) | 1990-08-01 | 1994-11-29 | Canon Kabushiki Kaisha | Sentence generating system |
GB2279164A (en) * | 1993-06-18 | 1994-12-21 | Canon Res Ct Europe Ltd | Processing a bilingual database. |
US6304841B1 (en) * | 1993-10-28 | 2001-10-16 | International Business Machines Corporation | Automatic construction of conditional exponential models from elementary features |
US5510981A (en) * | 1993-10-28 | 1996-04-23 | International Business Machines Corporation | Language translation apparatus and method using context-based translation models |
JP3377290B2 (ja) * | 1994-04-27 | 2003-02-17 | シャープ株式会社 | イディオム処理機能を持つ機械翻訳装置 |
GB2295470A (en) * | 1994-11-28 | 1996-05-29 | Sharp Kk | Machine translation system |
DE69837979T2 (de) * | 1997-06-27 | 2008-03-06 | International Business Machines Corp. | System zum Extrahieren einer mehrsprachigen Terminologie |
US6533822B2 (en) | 1998-01-30 | 2003-03-18 | Xerox Corporation | Creating summaries along with indicators, and automatically positioned tabs |
GB2337611A (en) * | 1998-05-20 | 1999-11-24 | Sharp Kk | Multilingual document retrieval system |
GB2338089A (en) * | 1998-06-02 | 1999-12-08 | Sharp Kk | Indexing method |
US6092034A (en) * | 1998-07-27 | 2000-07-18 | International Business Machines Corporation | Statistical translation system and method for fast sense disambiguation and translation of large corpora using fertility models and sense models |
JP2000132550A (ja) * | 1998-10-26 | 2000-05-12 | Matsushita Electric Ind Co Ltd | 機械翻訳のための中国語生成装置 |
US6393389B1 (en) * | 1999-09-23 | 2002-05-21 | Xerox Corporation | Using ranked translation choices to obtain sequences indicating meaning of multi-token expressions |
-
2001
- 2001-05-11 AU AU2001261505A patent/AU2001261505A1/en not_active Abandoned
- 2001-05-11 WO PCT/US2001/015379 patent/WO2001086491A2/en active Application Filing
- 2001-05-11 CA CA002408819A patent/CA2408819C/en not_active Expired - Lifetime
- 2001-05-11 EP EP01935406A patent/EP1352338A2/en not_active Withdrawn
- 2001-05-11 CN CN01812317A patent/CN1465018A/zh active Pending
- 2001-05-11 AU AU2001261506A patent/AU2001261506A1/en not_active Abandoned
- 2001-05-11 US US09/854,327 patent/US7533013B2/en not_active Expired - Lifetime
- 2001-05-11 US US09/854,301 patent/US20020046018A1/en not_active Abandoned
- 2001-05-11 WO PCT/US2001/015380 patent/WO2001086489A2/en active Search and Examination
- 2001-05-11 JP JP2001583366A patent/JP2004501429A/ja active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5477451A (en) * | 1991-07-25 | 1995-12-19 | International Business Machines Corp. | Method and system for natural language translation |
US5805832A (en) * | 1991-07-25 | 1998-09-08 | International Business Machines Corporation | System for parametric text to text language translation |
US5642520A (en) * | 1993-12-07 | 1997-06-24 | Nippon Telegraph And Telephone Corporation | Method and apparatus for recognizing topic structure of language data |
US5761631A (en) * | 1994-11-17 | 1998-06-02 | International Business Machines Corporation | Parsing method and system for natural language processing |
US5903858A (en) * | 1995-06-23 | 1999-05-11 | Saraki; Masashi | Translation machine for editing a original text by rewriting the same and translating the rewrote one |
US6205456B1 (en) * | 1997-01-17 | 2001-03-20 | Fujitsu Limited | Summarization apparatus and method |
US5991710A (en) * | 1997-05-20 | 1999-11-23 | International Business Machines Corporation | Statistical translation system with features based on phrases or groups of words |
US6112168A (en) * | 1997-10-20 | 2000-08-29 | Microsoft Corporation | Automatically recognizing the discourse structure of a body of text |
Cited By (372)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10198438B2 (en) | 1999-09-17 | 2019-02-05 | Sdl Inc. | E-services translation utilizing machine translation and translation memory |
US10216731B2 (en) | 1999-09-17 | 2019-02-26 | Sdl Inc. | E-services translation utilizing machine translation and translation memory |
US6961692B1 (en) * | 2000-08-01 | 2005-11-01 | Fuji Xerox Co, Ltd. | System and method for writing analysis using the linguistic discourse model |
US20170132529A1 (en) * | 2000-09-28 | 2017-05-11 | Intel Corporation | Method and Apparatus for Extracting Entity Names and Their Relations |
US9954794B2 (en) | 2001-01-18 | 2018-04-24 | Sdl Inc. | Globalization management system and method therefor |
US20020138248A1 (en) * | 2001-01-26 | 2002-09-26 | Corston-Oliver Simon H. | Lingustically intelligent text compression |
US7398203B2 (en) | 2001-01-26 | 2008-07-08 | Microsoft Corporation | Linguistically intelligent text compression |
US7069207B2 (en) * | 2001-01-26 | 2006-06-27 | Microsoft Corporation | Linguistically intelligent text compression |
US20060184351A1 (en) * | 2001-01-26 | 2006-08-17 | Microsoft Corporation | Linguistically intelligent text compression |
US7454698B2 (en) * | 2001-02-15 | 2008-11-18 | International Business Machines Corporation | Digital document browsing system and method thereof |
US20020186241A1 (en) * | 2001-02-15 | 2002-12-12 | Ibm | Digital document browsing system and method thereof |
US8214196B2 (en) | 2001-07-03 | 2012-07-03 | University Of Southern California | Syntax-based statistical translation model |
US7491690B2 (en) | 2001-11-14 | 2009-02-17 | Northwestern University | Self-assembly and mineralization of peptide-amphiphile nanofibers |
US7838491B2 (en) | 2001-11-14 | 2010-11-23 | Northwestern University | Self-assembly and mineralization of peptide-amphiphile nanofibers |
US20090156505A1 (en) * | 2001-11-14 | 2009-06-18 | Northwestern University | Self-assembly and mineralization of peptide-amphiphile nanofibers |
US20080033715A1 (en) * | 2002-01-14 | 2008-02-07 | Microsoft Corporation | System for normalizing a discourse representation structure and normalized data structure |
US8412515B2 (en) * | 2002-01-14 | 2013-04-02 | Microsoft Corporation | System for normalizing a discourse representation structure and normalized data structure |
US7371719B2 (en) | 2002-02-15 | 2008-05-13 | Northwestern University | Self-assembly of peptide-amphiphile nanofibers under physiological conditions |
US20110008890A1 (en) * | 2002-02-15 | 2011-01-13 | Northwestern University | Self-Assembly of Peptide-Amphiphile Nanofibers Under Physiological Conditions |
US20040001893A1 (en) * | 2002-02-15 | 2004-01-01 | Stupp Samuel I. | Self-assembly of peptide-amphiphile nanofibers under physiological conditions |
US20080177033A1 (en) * | 2002-02-15 | 2008-07-24 | Stupp Samuel I | Self-Assembly of Peptide-Amphiphile Nanofibers under Physiological Conditions |
US8063014B2 (en) | 2002-02-15 | 2011-11-22 | Northwestern University | Self-assembly of peptide-amphiphile nanofibers under physiological conditions |
US7745708B2 (en) | 2002-02-15 | 2010-06-29 | Northwestern University | Self-assembly of peptide-amphiphile nanofibers under physiological conditions |
US10410533B2 (en) | 2002-02-22 | 2019-09-10 | Educational Testing Service | Portal assessment design system for educational testing |
US20050170325A1 (en) * | 2002-02-22 | 2005-08-04 | Steinberg Linda S. | Portal assessment design system for educational testing |
US8651873B2 (en) | 2002-02-22 | 2014-02-18 | Educational Testing Service | Portal assessment design system for educational testing |
US20060004732A1 (en) * | 2002-02-26 | 2006-01-05 | Odom Paul S | Search engine methods and systems for generating relevant search results and advertisements |
US20030167252A1 (en) * | 2002-02-26 | 2003-09-04 | Pliant Technologies, Inc. | Topic identification and use thereof in information retrieval systems |
US7340466B2 (en) * | 2002-02-26 | 2008-03-04 | Kang Jo Mgmt. Limited Liability Company | Topic identification and use thereof in information retrieval systems |
US20100262603A1 (en) * | 2002-02-26 | 2010-10-14 | Odom Paul S | Search engine methods and systems for displaying relevant topics |
US20070265996A1 (en) * | 2002-02-26 | 2007-11-15 | Odom Paul S | Search engine methods and systems for displaying relevant topics |
US7716207B2 (en) | 2002-02-26 | 2010-05-11 | Odom Paul S | Search engine methods and systems for displaying relevant topics |
US8234106B2 (en) | 2002-03-26 | 2012-07-31 | University Of Southern California | Building a translation lexicon from comparable, non-parallel corpora |
US20100042398A1 (en) * | 2002-03-26 | 2010-02-18 | Daniel Marcu | Building A Translation Lexicon From Comparable, Non-Parallel Corpora |
US7516415B2 (en) * | 2002-03-28 | 2009-04-07 | Fujitsu Limited | Apparatus for and method of generating synchronized contents information, and computer product |
US20030188255A1 (en) * | 2002-03-28 | 2003-10-02 | Fujitsu Limited | Apparatus for and method of generating synchronized contents information, and computer product |
US20100318348A1 (en) * | 2002-05-20 | 2010-12-16 | Microsoft Corporation | Applying a structured language model to information extraction |
US8706491B2 (en) * | 2002-05-20 | 2014-04-22 | Microsoft Corporation | Applying a structured language model to information extraction |
EP1535261A1 (en) * | 2002-06-24 | 2005-06-01 | Educational Testing Service | Automated essay annotation system and method |
EP1535261A4 (en) * | 2002-06-24 | 2011-02-09 | Educational Testing Service | SYSTEM AND METHOD FOR AUTOMATIC REPORTING ANNOTATION |
US7534761B1 (en) | 2002-08-21 | 2009-05-19 | North Western University | Charged peptide-amphiphile solutions and self-assembled peptide nanofiber networks formed therefrom |
US7305336B2 (en) * | 2002-08-30 | 2007-12-04 | Fuji Xerox Co., Ltd. | System and method for summarization combining natural language generation with structural analysis |
US20040044519A1 (en) * | 2002-08-30 | 2004-03-04 | Livia Polanyi | System and method for summarization combining natural language generation with structural analysis |
US20040117734A1 (en) * | 2002-09-30 | 2004-06-17 | Frank Krickhahn | Method and apparatus for structuring texts |
US7554021B2 (en) | 2002-11-12 | 2009-06-30 | Northwestern University | Composition and method for self-assembly and mineralization of peptide amphiphiles |
US8124583B2 (en) | 2002-11-12 | 2012-02-28 | Northwestern University | Composition and method for self-assembly and mineralization of peptide-amphiphiles |
US20060149036A1 (en) * | 2002-11-12 | 2006-07-06 | Stupp Samuel I | Composition and method for self-assembly and mineralizatin of peptide amphiphiles |
US7683025B2 (en) | 2002-11-14 | 2010-03-23 | Northwestern University | Synthesis and self-assembly of ABC triblock bola peptide amphiphiles |
GB2411028A (en) * | 2002-11-14 | 2005-08-17 | Educational Testing Service | Automated evaluation of overly repetitive word use in an essay |
WO2004046956A1 (en) * | 2002-11-14 | 2004-06-03 | Educational Testing Service | Automated evaluation of overly repetitive word use in an essay |
US20040167885A1 (en) * | 2002-12-06 | 2004-08-26 | Attensity Corporation | Data products of processes of extracting role related information from free text sources |
US20040167883A1 (en) * | 2002-12-06 | 2004-08-26 | Attensity Corporation | Methods and systems for providing a service for producing structured data elements from free text sources |
US20040167911A1 (en) * | 2002-12-06 | 2004-08-26 | Attensity Corporation | Methods and products for integrating mixed format data including the extraction of relational facts from free text |
US20040167870A1 (en) * | 2002-12-06 | 2004-08-26 | Attensity Corporation | Systems and methods for providing a mixed data integration service |
US20040167910A1 (en) * | 2002-12-06 | 2004-08-26 | Attensity Corporation | Integrated data products of processes of integrating mixed format data |
US20040167908A1 (en) * | 2002-12-06 | 2004-08-26 | Attensity Corporation | Integration of structured data with free text for data mining |
US20040215634A1 (en) * | 2002-12-06 | 2004-10-28 | Attensity Corporation | Methods and products for merging codes and notes into an integrated relational database |
US20050108256A1 (en) * | 2002-12-06 | 2005-05-19 | Attensity Corporation | Visualization of integrated structured and unstructured data |
US20040167884A1 (en) * | 2002-12-06 | 2004-08-26 | Attensity Corporation | Methods and products for producing role related information from free text sources |
US20040167887A1 (en) * | 2002-12-06 | 2004-08-26 | Attensity Corporation | Integration of structured data with relational facts from free text for data mining |
US20040258726A1 (en) * | 2003-02-11 | 2004-12-23 | Stupp Samuel I. | Methods and materials for nanocrystalline surface coatings and attachment of peptide amphiphile nanofibers thereon |
US7390526B2 (en) | 2003-02-11 | 2008-06-24 | Northwestern University | Methods and materials for nanocrystalline surface coatings and attachment of peptide amphiphile nanofibers thereon |
US20040230415A1 (en) * | 2003-05-12 | 2004-11-18 | Stefan Riezler | Systems and methods for grammatical text condensation |
US20050038643A1 (en) * | 2003-07-02 | 2005-02-17 | Philipp Koehn | Statistical noun phrase translation |
US8548794B2 (en) | 2003-07-02 | 2013-10-01 | University Of Southern California | Statistical noun phrase translation |
US20050086592A1 (en) * | 2003-10-15 | 2005-04-21 | Livia Polanyi | Systems and methods for hybrid text summarization |
US7610190B2 (en) * | 2003-10-15 | 2009-10-27 | Fuji Xerox Co., Ltd. | Systems and methods for hybrid text summarization |
US8138140B2 (en) | 2003-12-05 | 2012-03-20 | Northwestern University | Self-assembling peptide amphiphiles and related methods for growth factor delivery |
US8580923B2 (en) | 2003-12-05 | 2013-11-12 | Northwestern University | Self-assembling peptide amphiphiles and related methods for growth factor delivery |
US7452679B2 (en) | 2003-12-05 | 2008-11-18 | Northwestern University | Branched peptide amphiphiles, related epitope compounds and self assembled structures thereof |
US20090269847A1 (en) * | 2003-12-05 | 2009-10-29 | Northwestern University | Self-assembling peptide amphiphiles and related methods for growth factor delivery |
US7544661B2 (en) | 2003-12-05 | 2009-06-09 | Northwestern University | Self-assembling peptide amphiphiles and related methods for growth factor delivery |
US20050209145A1 (en) * | 2003-12-05 | 2005-09-22 | Stupp Samuel I | Self-assembling peptide amphiphiles and related methods for growth factor delivery |
US20050208589A1 (en) * | 2003-12-05 | 2005-09-22 | Stupp Samuel I | Branched peptide amphiphiles, related epitope compounds and self assembled structures thereof |
US20050138556A1 (en) * | 2003-12-18 | 2005-06-23 | Xerox Corporation | Creation of normalized summaries using common domain models for input text analysis and output text generation |
US7788083B2 (en) * | 2003-12-19 | 2010-08-31 | Palo Alto Research Center Incorporated | Systems and methods for the generation of alternate phrases from packed meaning |
US20050137855A1 (en) * | 2003-12-19 | 2005-06-23 | Maxwell John T.Iii | Systems and methods for the generation of alternate phrases from packed meaning |
US20070250305A1 (en) * | 2003-12-19 | 2007-10-25 | Xerox Corporation | Systems and methods for the generation of alternate phrases from packed meaning |
US7657420B2 (en) | 2003-12-19 | 2010-02-02 | Palo Alto Research Center Incorporated | Systems and methods for the generation of alternate phrases from packed meaning |
US20050187772A1 (en) * | 2004-02-25 | 2005-08-25 | Fuji Xerox Co., Ltd. | Systems and methods for synthesizing speech using discourse function level prosodic features |
US10248650B2 (en) | 2004-03-05 | 2019-04-02 | Sdl Inc. | In-context exact (ICE) matching |
US8296127B2 (en) | 2004-03-23 | 2012-10-23 | University Of Southern California | Discovery of parallel text portions in comparable collections of corpora and training using comparable texts |
US20050221266A1 (en) * | 2004-04-02 | 2005-10-06 | Mislevy Robert J | System and method for assessment design |
US8977536B2 (en) | 2004-04-16 | 2015-03-10 | University Of Southern California | Method and system for translating information with a higher probability of a correct translation |
US20080270109A1 (en) * | 2004-04-16 | 2008-10-30 | University Of Southern California | Method and System for Translating Information with a Higher Probability of a Correct Translation |
US8666725B2 (en) | 2004-04-16 | 2014-03-04 | University Of Southern California | Selection and use of nonstatistical translation components in a statistical machine translation framework |
US20050256848A1 (en) * | 2004-05-13 | 2005-11-17 | International Business Machines Corporation | System and method for user rank search |
US7562008B2 (en) * | 2004-06-23 | 2009-07-14 | Ning-Ping Chan | Machine translation method and system that decomposes complex sentences into two or more sentences |
US20060009961A1 (en) * | 2004-06-23 | 2006-01-12 | Ning-Ping Chan | Method of decomposing prose elements in document processing |
US20100174524A1 (en) * | 2004-07-02 | 2010-07-08 | Philipp Koehn | Empirical Methods for Splitting Compound Words with Application to Machine Translation |
US20060010138A1 (en) * | 2004-07-09 | 2006-01-12 | International Business Machines Corporation | Method and system for efficient representation, manipulation, communication, and search of hierarchical composite named entities |
US8768969B2 (en) * | 2004-07-09 | 2014-07-01 | Nuance Communications, Inc. | Method and system for efficient representation, manipulation, communication, and search of hierarchical composite named entities |
US8489628B2 (en) | 2004-07-26 | 2013-07-16 | Google Inc. | Phrase-based detection of duplicate documents in an information retrieval system |
US7599914B2 (en) | 2004-07-26 | 2009-10-06 | Google Inc. | Phrase-based searching in an information retrieval system |
US9037573B2 (en) | 2004-07-26 | 2015-05-19 | Google, Inc. | Phase-based personalization of searches in an information retrieval system |
US20100030773A1 (en) * | 2004-07-26 | 2010-02-04 | Google Inc. | Multiple index based information retrieval system |
US20080306943A1 (en) * | 2004-07-26 | 2008-12-11 | Anna Lynn Patterson | Phrase-based detection of duplicate documents in an information retrieval system |
US20060294155A1 (en) * | 2004-07-26 | 2006-12-28 | Patterson Anna L | Detecting spam documents in a phrase based information retrieval system |
US20080319971A1 (en) * | 2004-07-26 | 2008-12-25 | Anna Lynn Patterson | Phrase-based personalization of searches in an information retrieval system |
US9361331B2 (en) | 2004-07-26 | 2016-06-07 | Google Inc. | Multiple index based information retrieval system |
US7702618B1 (en) | 2004-07-26 | 2010-04-20 | Google Inc. | Information retrieval system for archiving multiple document versions |
US7711679B2 (en) | 2004-07-26 | 2010-05-04 | Google Inc. | Phrase-based detection of duplicate documents in an information retrieval system |
US9384224B2 (en) | 2004-07-26 | 2016-07-05 | Google Inc. | Information retrieval system for archiving multiple document versions |
US9569505B2 (en) | 2004-07-26 | 2017-02-14 | Google Inc. | Phrase-based searching in an information retrieval system |
CN1728143B (zh) * | 2004-07-26 | 2010-06-09 | 咕果公司 | 基于短语产生文献说明 |
US20060020473A1 (en) * | 2004-07-26 | 2006-01-26 | Atsuo Hiroe | Method, apparatus, and program for dialogue, and storage medium including a program stored therein |
US20100161625A1 (en) * | 2004-07-26 | 2010-06-24 | Google Inc. | Phrase-based detection of duplicate documents in an information retrieval system |
US7426507B1 (en) | 2004-07-26 | 2008-09-16 | Google, Inc. | Automatic taxonomy generation in search results using phrases |
US8560550B2 (en) | 2004-07-26 | 2013-10-15 | Google, Inc. | Multiple index based information retrieval system |
US7536408B2 (en) | 2004-07-26 | 2009-05-19 | Google Inc. | Phrase-based indexing in an information retrieval system |
US7603345B2 (en) | 2004-07-26 | 2009-10-13 | Google Inc. | Detecting spam documents in a phrase based information retrieval system |
US20060020571A1 (en) * | 2004-07-26 | 2006-01-26 | Patterson Anna L | Phrase-based generation of document descriptions |
US9817825B2 (en) | 2004-07-26 | 2017-11-14 | Google Llc | Multiple index based information retrieval system |
US9817886B2 (en) | 2004-07-26 | 2017-11-14 | Google Llc | Information retrieval system for archiving multiple document versions |
US7584175B2 (en) * | 2004-07-26 | 2009-09-01 | Google Inc. | Phrase-based generation of document descriptions |
US7567959B2 (en) | 2004-07-26 | 2009-07-28 | Google Inc. | Multiple index based information retrieval system |
US10671676B2 (en) | 2004-07-26 | 2020-06-02 | Google Llc | Multiple index based information retrieval system |
US20060020607A1 (en) * | 2004-07-26 | 2006-01-26 | Patterson Anna L | Phrase-based indexing in an information retrieval system |
US7580929B2 (en) | 2004-07-26 | 2009-08-25 | Google Inc. | Phrase-based personalization of searches in an information retrieval system |
US9990421B2 (en) | 2004-07-26 | 2018-06-05 | Google Llc | Phrase-based searching in an information retrieval system |
US20110131223A1 (en) * | 2004-07-26 | 2011-06-02 | Google Inc. | Detecting spam documents in a phrase based information retrieval system |
US20060031195A1 (en) * | 2004-07-26 | 2006-02-09 | Patterson Anna L | Phrase-based searching in an information retrieval system |
US7580921B2 (en) | 2004-07-26 | 2009-08-25 | Google Inc. | Phrase identification in an information retrieval system |
US8108412B2 (en) | 2004-07-26 | 2012-01-31 | Google, Inc. | Phrase-based detection of duplicate documents in an information retrieval system |
US8078629B2 (en) | 2004-07-26 | 2011-12-13 | Google Inc. | Detecting spam documents in a phrase based information retrieval system |
US20060142995A1 (en) * | 2004-10-12 | 2006-06-29 | Kevin Knight | Training for a text-to-text application which uses string to tree conversion for training and decoding |
US8600728B2 (en) | 2004-10-12 | 2013-12-03 | University Of Southern California | Training for a text-to-text application which uses string to tree conversion for training and decoding |
US20060095250A1 (en) * | 2004-11-03 | 2006-05-04 | Microsoft Corporation | Parser for natural language processing |
US7970600B2 (en) | 2004-11-03 | 2011-06-28 | Microsoft Corporation | Using a first natural language parser to train a second parser |
US7827029B2 (en) * | 2004-11-30 | 2010-11-02 | Palo Alto Research Center Incorporated | Systems and methods for user-interest sensitive note-taking |
US20060116860A1 (en) * | 2004-11-30 | 2006-06-01 | Xerox Corporation | Systems and methods for user-interest sensitive condensation |
US7801723B2 (en) * | 2004-11-30 | 2010-09-21 | Palo Alto Research Center Incorporated | Systems and methods for user-interest sensitive condensation |
US20060155530A1 (en) * | 2004-12-14 | 2006-07-13 | International Business Machines Corporation | Method and apparatus for generation of text documents |
US20070240078A1 (en) * | 2004-12-21 | 2007-10-11 | Palo Alto Research Center Incorporated | Systems and methods for using and constructing user-interest sensitive indicators of search results |
US7890500B2 (en) | 2004-12-21 | 2011-02-15 | Palo Alto Research Center Incorporated | Systems and methods for using and constructing user-interest sensitive indicators of search results |
US20060247165A1 (en) * | 2005-01-21 | 2006-11-02 | Stupp Samuel I | Methods and compositions for encapsulation of cells |
US20100169305A1 (en) * | 2005-01-25 | 2010-07-01 | Google Inc. | Information retrieval system for archiving multiple document versions |
US8612427B2 (en) | 2005-01-25 | 2013-12-17 | Google, Inc. | Information retrieval system for archiving multiple document versions |
US20060194183A1 (en) * | 2005-02-28 | 2006-08-31 | Yigal Attali | Method of model scaling for an automated essay scoring system |
US8202098B2 (en) | 2005-02-28 | 2012-06-19 | Educational Testing Service | Method of model scaling for an automated essay scoring system |
US8632344B2 (en) | 2005-02-28 | 2014-01-21 | Educational Testing Service | Method of model scaling for an automated essay scoring system |
US20070277250A1 (en) * | 2005-03-04 | 2007-11-29 | Stupp Samuel I | Angiogenic heparin-binding epitopes, peptide amphiphiles, self-assembled compositions and related methods of use |
US7851445B2 (en) | 2005-03-04 | 2010-12-14 | Northwestern University | Angiogenic heparin-binding epitopes, peptide amphiphiles, self-assembled compositions and related methods of use |
US20060277028A1 (en) * | 2005-06-01 | 2006-12-07 | Microsoft Corporation | Training a statistical parser on noisy data by filtering |
US8886517B2 (en) | 2005-06-17 | 2014-11-11 | Language Weaver, Inc. | Trust scoring for language translation systems |
US20070192309A1 (en) * | 2005-10-12 | 2007-08-16 | Gordon Fischer | Method and system for identifying sentence boundaries |
US10319252B2 (en) * | 2005-11-09 | 2019-06-11 | Sdl Inc. | Language capability assessment and training apparatus and techniques |
US9165039B2 (en) | 2005-11-29 | 2015-10-20 | Kang Jo Mgmt, Limited Liability Company | Methods and systems for providing personalized contextual search results |
US20070260598A1 (en) * | 2005-11-29 | 2007-11-08 | Odom Paul S | Methods and systems for providing personalized contextual search results |
US8416713B2 (en) | 2006-03-06 | 2013-04-09 | Koninklijke Philips Electronics N.V. | Use of decision trees for automatic commissioning |
US20090045971A1 (en) * | 2006-03-06 | 2009-02-19 | Koninklijke Philips Electronics N.V. | Use of decision trees for automatic commissioning |
WO2007117652A3 (en) * | 2006-04-07 | 2008-05-02 | Basis Technology Corp | Method and system of machine translation |
US7827028B2 (en) | 2006-04-07 | 2010-11-02 | Basis Technology Corporation | Method and system of machine translation |
US8943080B2 (en) | 2006-04-07 | 2015-01-27 | University Of Southern California | Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections |
US20070260449A1 (en) * | 2006-05-02 | 2007-11-08 | Shimei Pan | Instance-based sentence boundary determination by optimization |
US7552047B2 (en) * | 2006-05-02 | 2009-06-23 | International Business Machines Corporation | Instance-based sentence boundary determination by optimization |
US8886518B1 (en) | 2006-08-07 | 2014-11-11 | Language Weaver, Inc. | System and method for capitalizing machine translated text |
US8433556B2 (en) | 2006-11-02 | 2013-04-30 | University Of Southern California | Semi-supervised training for statistical word alignment |
US9122674B1 (en) | 2006-12-15 | 2015-09-01 | Language Weaver, Inc. | Use of annotations in statistical machine translation |
US8468149B1 (en) | 2007-01-26 | 2013-06-18 | Language Weaver, Inc. | Multi-lingual online community |
US8615389B1 (en) | 2007-03-16 | 2013-12-24 | Language Weaver, Inc. | Generation and exploitation of an approximate language model |
US7925655B1 (en) | 2007-03-30 | 2011-04-12 | Google Inc. | Query scheduling using hierarchical tiers of index servers |
US7702614B1 (en) | 2007-03-30 | 2010-04-20 | Google Inc. | Index updating using segment swapping |
US8600975B1 (en) | 2007-03-30 | 2013-12-03 | Google Inc. | Query phrasification |
US8166021B1 (en) | 2007-03-30 | 2012-04-24 | Google Inc. | Query phrasification |
US8943067B1 (en) | 2007-03-30 | 2015-01-27 | Google Inc. | Index server architecture using tiered and sharded phrase posting lists |
US8402033B1 (en) | 2007-03-30 | 2013-03-19 | Google Inc. | Phrase extraction using subphrase scoring |
US8682901B1 (en) | 2007-03-30 | 2014-03-25 | Google Inc. | Index server architecture using tiered and sharded phrase posting lists |
US9652483B1 (en) | 2007-03-30 | 2017-05-16 | Google Inc. | Index server architecture using tiered and sharded phrase posting lists |
US8086594B1 (en) | 2007-03-30 | 2011-12-27 | Google Inc. | Bifurcated document relevance scoring |
US8166045B1 (en) | 2007-03-30 | 2012-04-24 | Google Inc. | Phrase extraction using subphrase scoring |
US10152535B1 (en) | 2007-03-30 | 2018-12-11 | Google Llc | Query phrasification |
US7693813B1 (en) | 2007-03-30 | 2010-04-06 | Google Inc. | Index server architecture using tiered and sharded phrase posting lists |
US9355169B1 (en) | 2007-03-30 | 2016-05-31 | Google Inc. | Phrase extraction using subphrase scoring |
US8090723B2 (en) | 2007-03-30 | 2012-01-03 | Google Inc. | Index server architecture using tiered and sharded phrase posting lists |
US9223877B1 (en) | 2007-03-30 | 2015-12-29 | Google Inc. | Index server architecture using tiered and sharded phrase posting lists |
US20100161617A1 (en) * | 2007-03-30 | 2010-06-24 | Google Inc. | Index server architecture using tiered and sharded phrase posting lists |
US8831928B2 (en) | 2007-04-04 | 2014-09-09 | Language Weaver, Inc. | Customizable machine translation service |
US8076295B2 (en) | 2007-04-17 | 2011-12-13 | Nanotope, Inc. | Peptide amphiphiles having improved solubility and methods of using same |
US20090042804A1 (en) * | 2007-04-17 | 2009-02-12 | Hulvat James F | Novel peptide amphiphiles having improved solubility and methods of using same |
US7925496B1 (en) * | 2007-04-23 | 2011-04-12 | The United States Of America As Represented By The Secretary Of The Navy | Method for summarizing natural language text |
US8825466B1 (en) | 2007-06-08 | 2014-09-02 | Language Weaver, Inc. | Modification of annotated bilingual segment pairs in syntax-based machine translation |
US20090006080A1 (en) * | 2007-06-29 | 2009-01-01 | Fujitsu Limited | Computer-readable medium having sentence dividing program stored thereon, sentence dividing apparatus, and sentence dividing method |
US9009023B2 (en) * | 2007-06-29 | 2015-04-14 | Fujitsu Limited | Computer-readable medium having sentence dividing program stored thereon, sentence dividing apparatus, and sentence dividing method |
US8117223B2 (en) | 2007-09-07 | 2012-02-14 | Google Inc. | Integrating external related phrase information into a phrase-based indexing information retrieval system |
US8631027B2 (en) | 2007-09-07 | 2014-01-14 | Google Inc. | Integrated external related phrase information into a phrase-based indexing information retrieval system |
US9122675B1 (en) | 2008-04-22 | 2015-09-01 | West Corporation | Processing natural language grammar |
US20090306964A1 (en) * | 2008-06-06 | 2009-12-10 | Olivier Bonnet | Data detection |
US9454522B2 (en) | 2008-06-06 | 2016-09-27 | Apple Inc. | Detection of data in a sequence of characters |
US8738360B2 (en) * | 2008-06-06 | 2014-05-27 | Apple Inc. | Data detection of a character sequence having multiple possible data types |
US20090326927A1 (en) * | 2008-06-27 | 2009-12-31 | Microsoft Corporation | Adaptive generation of out-of-dictionary personalized long words |
US9411800B2 (en) * | 2008-06-27 | 2016-08-09 | Microsoft Technology Licensing, Llc | Adaptive generation of out-of-dictionary personalized long words |
US20100131274A1 (en) * | 2008-11-26 | 2010-05-27 | At&T Intellectual Property I, L.P. | System and method for dialog modeling |
US11488582B2 (en) | 2008-11-26 | 2022-11-01 | At&T Intellectual Property I, L.P. | System and method for dialog modeling |
US10672381B2 (en) | 2008-11-26 | 2020-06-02 | At&T Intellectual Property I, L.P. | System and method for dialog modeling |
US9129601B2 (en) * | 2008-11-26 | 2015-09-08 | At&T Intellectual Property I, L.P. | System and method for dialog modeling |
US20100169359A1 (en) * | 2008-12-30 | 2010-07-01 | Barrett Leslie A | System, Method, and Apparatus for Information Extraction of Textual Documents |
US20100266557A1 (en) * | 2009-04-13 | 2010-10-21 | Northwestern University | Novel peptide-based scaffolds for cartilage regeneration and methods for their use |
US8450271B2 (en) | 2009-04-13 | 2013-05-28 | Northwestern University | Peptide-based scaffolds for cartilage regeneration and methods for their use |
US8990064B2 (en) | 2009-07-28 | 2015-03-24 | Language Weaver, Inc. | Translating documents based on content |
US8380486B2 (en) | 2009-10-01 | 2013-02-19 | Language Weaver, Inc. | Providing machine-generated translations and corresponding trust levels |
US8676563B2 (en) | 2009-10-01 | 2014-03-18 | Language Weaver, Inc. | Providing human-generated and machine-generated trusted translations |
US8805677B1 (en) * | 2010-02-10 | 2014-08-12 | West Corporation | Processing natural language grammar |
US10984429B2 (en) | 2010-03-09 | 2021-04-20 | Sdl Inc. | Systems and methods for translating textual content |
US20110225104A1 (en) * | 2010-03-09 | 2011-09-15 | Radu Soricut | Predicting the Cost Associated with Translating Textual Content |
US10417646B2 (en) | 2010-03-09 | 2019-09-17 | Sdl Inc. | Predicting the cost associated with translating textual content |
US20110282651A1 (en) * | 2010-05-11 | 2011-11-17 | Microsoft Corporation | Generating snippets based on content features |
US8788260B2 (en) * | 2010-05-11 | 2014-07-22 | Microsoft Corporation | Generating snippets based on content features |
US20120035912A1 (en) * | 2010-07-30 | 2012-02-09 | Ben-Gurion University Of The Negev Research And Development Authority | Multilingual sentence extractor |
US8594998B2 (en) * | 2010-07-30 | 2013-11-26 | Ben-Gurion University Of The Negev Research And Development Authority | Multilingual sentence extractor |
US8838440B2 (en) * | 2010-09-14 | 2014-09-16 | International Business Machines Corporation | Generating parser combination by combining language processing parsers |
US20120065960A1 (en) * | 2010-09-14 | 2012-03-15 | International Business Machines Corporation | Generating parser combination by combining language processing parsers |
US20120109945A1 (en) * | 2010-10-29 | 2012-05-03 | Emilia Maria Lapko | Method and system of improving navigation within a set of electronic documents |
US20120143595A1 (en) * | 2010-12-06 | 2012-06-07 | Xin Li | Fast title/summary extraction from long descriptions |
US9317595B2 (en) * | 2010-12-06 | 2016-04-19 | Yahoo! Inc. | Fast title/summary extraction from long descriptions |
US11501220B2 (en) | 2011-01-07 | 2022-11-15 | Narrative Science Inc. | Automatic generation of narratives from data using communication goals and narrative analytics |
US10755042B2 (en) | 2011-01-07 | 2020-08-25 | Narrative Science Inc. | Automatic generation of narratives from data using communication goals and narrative analytics |
US11044949B2 (en) | 2011-01-29 | 2021-06-29 | Sdl Netherlands B.V. | Systems and methods for dynamic delivery of web content |
US10061749B2 (en) | 2011-01-29 | 2018-08-28 | Sdl Netherlands B.V. | Systems and methods for contextual vocabularies and customer segmentation |
US11694215B2 (en) | 2011-01-29 | 2023-07-04 | Sdl Netherlands B.V. | Systems and methods for managing web content |
US10521492B2 (en) | 2011-01-29 | 2019-12-31 | Sdl Netherlands B.V. | Systems and methods that utilize contextual vocabularies and customer segmentation to deliver web content |
US10657540B2 (en) | 2011-01-29 | 2020-05-19 | Sdl Netherlands B.V. | Systems, methods, and media for web content management |
US11301874B2 (en) | 2011-01-29 | 2022-04-12 | Sdl Netherlands B.V. | Systems and methods for managing web content and facilitating data exchange |
US10990644B2 (en) | 2011-01-29 | 2021-04-27 | Sdl Netherlands B.V. | Systems and methods for contextual vocabularies and customer segmentation |
US10580015B2 (en) | 2011-02-25 | 2020-03-03 | Sdl Netherlands B.V. | Systems, methods, and media for executing and optimizing online marketing initiatives |
US11366792B2 (en) | 2011-02-28 | 2022-06-21 | Sdl Inc. | Systems, methods, and media for generating analytical data |
US10140320B2 (en) | 2011-02-28 | 2018-11-27 | Sdl Inc. | Systems, methods, and media for generating analytical data |
US11003838B2 (en) | 2011-04-18 | 2021-05-11 | Sdl Inc. | Systems and methods for monitoring post translation editing |
US8694303B2 (en) | 2011-06-15 | 2014-04-08 | Language Weaver, Inc. | Systems and methods for tuning parameters in statistical machine translation |
US9984054B2 (en) | 2011-08-24 | 2018-05-29 | Sdl Inc. | Web interface including the review and manipulation of a web document and utilizing permission based control |
US11263390B2 (en) | 2011-08-24 | 2022-03-01 | Sdl Inc. | Systems and methods for informational document review, display and validation |
US8914279B1 (en) * | 2011-09-23 | 2014-12-16 | Google Inc. | Efficient parsing with structured prediction cascades |
US8886515B2 (en) | 2011-10-19 | 2014-11-11 | Language Weaver, Inc. | Systems and methods for enhancing machine translation post edit review processes |
US9336199B2 (en) * | 2011-11-29 | 2016-05-10 | Sk Telecom Co., Ltd. | Automatic sentence evaluation device using shallow parser to automatically evaluate sentence, and error detection apparatus and method of the same |
US20140067379A1 (en) * | 2011-11-29 | 2014-03-06 | Sk Telecom Co., Ltd. | Automatic sentence evaluation device using shallow parser to automatically evaluate sentence, and error detection apparatus and method of the same |
US8942973B2 (en) | 2012-03-09 | 2015-01-27 | Language Weaver, Inc. | Content page URL translation |
US10572928B2 (en) | 2012-05-11 | 2020-02-25 | Fredhopper B.V. | Method and system for recommending products based on a ranking cocktail |
US10402498B2 (en) | 2012-05-25 | 2019-09-03 | Sdl Inc. | Method and system for automatic management of reputation of translators |
US10261994B2 (en) | 2012-05-25 | 2019-04-16 | Sdl Inc. | Method and system for automatic management of reputation of translators |
US20150205786A1 (en) * | 2012-07-31 | 2015-07-23 | Nec Corporation | Problem situation detection device, problem situation detection method and problem situation detection-use program |
US10452740B2 (en) | 2012-09-14 | 2019-10-22 | Sdl Netherlands B.V. | External content libraries |
US11308528B2 (en) | 2012-09-14 | 2022-04-19 | Sdl Netherlands B.V. | Blueprinting of multimedia assets |
US11386186B2 (en) | 2012-09-14 | 2022-07-12 | Sdl Netherlands B.V. | External content library connector systems and methods |
US9336185B1 (en) * | 2012-09-18 | 2016-05-10 | Amazon Technologies, Inc. | Generating an electronic publication sample |
US9916306B2 (en) | 2012-10-19 | 2018-03-13 | Sdl Inc. | Statistical linguistic analysis of source content |
US9152622B2 (en) | 2012-11-26 | 2015-10-06 | Language Weaver, Inc. | Personalized machine translation via online adaptation |
US11921985B2 (en) | 2013-03-15 | 2024-03-05 | Narrative Science Llc | Method and system for configuring automatic generation of narratives from data |
US9501506B1 (en) | 2013-03-15 | 2016-11-22 | Google Inc. | Indexing system |
US11561684B1 (en) | 2013-03-15 | 2023-01-24 | Narrative Science Inc. | Method and system for configuring automatic generation of narratives from data |
US9483568B1 (en) | 2013-06-05 | 2016-11-01 | Google Inc. | Indexing system |
US9355372B2 (en) | 2013-07-03 | 2016-05-31 | Thomson Reuters Global Resources | Method and system for simplifying implicit rhetorical relation prediction in large scale annotated corpus |
WO2015003143A3 (en) * | 2013-07-03 | 2015-05-14 | Thomson Reuters Global Resources | Method and system for simplifying implicit rhetorical relation prediction in large scale annotated corpus |
US9213694B2 (en) | 2013-10-10 | 2015-12-15 | Language Weaver, Inc. | Efficient online domain adaptation |
US9336186B1 (en) * | 2013-10-10 | 2016-05-10 | Google Inc. | Methods and apparatus related to sentence compression |
US10394867B2 (en) | 2014-06-11 | 2019-08-27 | Hewlett-Packard Development Company, L.P. | Functional summarization of non-textual content based on a meta-algorithmic pattern |
WO2015191061A1 (en) * | 2014-06-11 | 2015-12-17 | Hewlett-Packard Development Company, L.P. | Functional summarization of non-textual content based on a meta-algorithmic pattern |
US9582501B1 (en) * | 2014-06-16 | 2017-02-28 | Yseop Sa | Techniques for automatic generation of natural language text |
US11922344B2 (en) | 2014-10-22 | 2024-03-05 | Narrative Science Llc | Automatic generation of narratives from data using communication goals and narrative analytics |
US20180011833A1 (en) * | 2015-02-02 | 2018-01-11 | National Institute Of Information And Communications Technology | Syntax analyzing device, learning device, machine translation device and storage medium |
US9767193B2 (en) * | 2015-03-27 | 2017-09-19 | Fujitsu Limited | Generation apparatus and method |
US20160283588A1 (en) * | 2015-03-27 | 2016-09-29 | Fujitsu Limited | Generation apparatus and method |
JP2016186772A (ja) * | 2015-03-27 | 2016-10-27 | 富士通株式会社 | 短縮文生成装置、方法、及びプログラム |
US11080493B2 (en) | 2015-10-30 | 2021-08-03 | Sdl Limited | Translation review workflow systems and methods |
US10614167B2 (en) | 2015-10-30 | 2020-04-07 | Sdl Plc | Translation review workflow systems and methods |
US11238090B1 (en) | 2015-11-02 | 2022-02-01 | Narrative Science Inc. | Applied artificial intelligence technology for using narrative analytics to automatically generate narratives from visualization data |
US11232268B1 (en) | 2015-11-02 | 2022-01-25 | Narrative Science Inc. | Applied artificial intelligence technology for using narrative analytics to automatically generate narratives from line charts |
US11222184B1 (en) | 2015-11-02 | 2022-01-11 | Narrative Science Inc. | Applied artificial intelligence technology for using narrative analytics to automatically generate narratives from bar charts |
US11188588B1 (en) | 2015-11-02 | 2021-11-30 | Narrative Science Inc. | Applied artificial intelligence technology for using narrative analytics to interactively generate narratives from visualization data |
US11170038B1 (en) | 2015-11-02 | 2021-11-09 | Narrative Science Inc. | Applied artificial intelligence technology for using narrative analytics to automatically generate narratives from multiple visualizations |
US10853583B1 (en) | 2016-08-31 | 2020-12-01 | Narrative Science Inc. | Applied artificial intelligence technology for selective control over narrative generation from visualizations of data |
US11144838B1 (en) | 2016-08-31 | 2021-10-12 | Narrative Science Inc. | Applied artificial intelligence technology for evaluating drivers of data presented in visualizations |
US11341338B1 (en) | 2016-08-31 | 2022-05-24 | Narrative Science Inc. | Applied artificial intelligence technology for interactively using narrative analytics to focus and control visualizations of data |
US10685189B2 (en) * | 2016-11-17 | 2020-06-16 | Goldman Sachs & Co. LLC | System and method for coupled detection of syntax and semantics for natural language understanding and generation |
AU2021245127B2 (en) * | 2016-11-17 | 2021-11-18 | Goldman Sachs & Co. LLC | System and method for coupled detection of syntax and semantics for natural language understanding and generation |
US11138389B2 (en) | 2016-11-17 | 2021-10-05 | Goldman Sachs & Co. LLC | System and method for coupled detection of syntax and semantics for natural language understanding and generation |
US11068661B1 (en) | 2017-02-17 | 2021-07-20 | Narrative Science Inc. | Applied artificial intelligence technology for narrative generation based on smart attributes |
US11568148B1 (en) | 2017-02-17 | 2023-01-31 | Narrative Science Inc. | Applied artificial intelligence technology for narrative generation based on explanation communication goals |
US11954445B2 (en) | 2017-02-17 | 2024-04-09 | Narrative Science Llc | Applied artificial intelligence technology for narrative generation based on explanation communication goals |
US10943069B1 (en) | 2017-02-17 | 2021-03-09 | Narrative Science Inc. | Applied artificial intelligence technology for narrative generation based on a conditional outcome framework |
US10762304B1 (en) | 2017-02-17 | 2020-09-01 | Narrative Science | Applied artificial intelligence technology for performing natural language generation (NLG) using composable communication goals and ontologies to generate narrative stories |
US10719542B1 (en) | 2017-02-17 | 2020-07-21 | Narrative Science Inc. | Applied artificial intelligence technology for ontology building to support natural language generation (NLG) using composable communication goals |
US11562146B2 (en) | 2017-02-17 | 2023-01-24 | Narrative Science Inc. | Applied artificial intelligence technology for narrative generation based on a conditional outcome framework |
US20190138595A1 (en) * | 2017-05-10 | 2019-05-09 | Oracle International Corporation | Enabling chatbots by detecting and supporting affective argumentation |
US11347946B2 (en) * | 2017-05-10 | 2022-05-31 | Oracle International Corporation | Utilizing discourse structure of noisy user-generated content for chatbot learning |
US20210165969A1 (en) * | 2017-05-10 | 2021-06-03 | Oracle International Corporation | Detection of deception within text using communicative discourse trees |
US11748572B2 (en) * | 2017-05-10 | 2023-09-05 | Oracle International Corporation | Enabling chatbots by validating argumentation |
US11615145B2 (en) | 2017-05-10 | 2023-03-28 | Oracle International Corporation | Converting a document into a chatbot-accessible form via the use of communicative discourse trees |
US11875118B2 (en) * | 2017-05-10 | 2024-01-16 | Oracle International Corporation | Detection of deception within text using communicative discourse trees |
US20210042473A1 (en) * | 2017-05-10 | 2021-02-11 | Oracle International Corporation | Enabling chatbots by validating argumentation |
US20180329879A1 (en) * | 2017-05-10 | 2018-11-15 | Oracle International Corporation | Enabling rhetorical analysis via the use of communicative discourse trees |
US10679011B2 (en) * | 2017-05-10 | 2020-06-09 | Oracle International Corporation | Enabling chatbots by detecting and supporting argumentation |
US20210049329A1 (en) * | 2017-05-10 | 2021-02-18 | Oracle International Corporation | Enabling rhetorical analysis via the use of communicative discourse trees |
US11960844B2 (en) * | 2017-05-10 | 2024-04-16 | Oracle International Corporation | Discourse parsing using semantic and syntactic relations |
US10796102B2 (en) * | 2017-05-10 | 2020-10-06 | Oracle International Corporation | Enabling rhetorical analysis via the use of communicative discourse trees |
US11373632B2 (en) * | 2017-05-10 | 2022-06-28 | Oracle International Corporation | Using communicative discourse trees to create a virtual persuasive dialogue |
US10853581B2 (en) * | 2017-05-10 | 2020-12-01 | Oracle International Corporation | Enabling rhetorical analysis via the use of communicative discourse trees |
US20200380214A1 (en) * | 2017-05-10 | 2020-12-03 | Oracle International Corporation | Enabling rhetorical analysis via the use of communicative discourse trees |
US11586827B2 (en) * | 2017-05-10 | 2023-02-21 | Oracle International Corporation | Generating desired discourse structure from an arbitrary text |
US11775771B2 (en) * | 2017-05-10 | 2023-10-03 | Oracle International Corporation | Enabling rhetorical analysis via the use of communicative discourse trees |
US20200410166A1 (en) * | 2017-05-10 | 2020-12-31 | Oracle International Corporation | Enabling chatbots by detecting and supporting affective argumentation |
US10839154B2 (en) * | 2017-05-10 | 2020-11-17 | Oracle International Corporation | Enabling chatbots by detecting and supporting affective argumentation |
US11783126B2 (en) * | 2017-05-10 | 2023-10-10 | Oracle International Corporation | Enabling chatbots by detecting and supporting affective argumentation |
US10817670B2 (en) * | 2017-05-10 | 2020-10-27 | Oracle International Corporation | Enabling chatbots by validating argumentation |
US20190272323A1 (en) * | 2017-05-10 | 2019-09-05 | Oracle International Corporation | Enabling chatbots by validating argumentation |
US10599885B2 (en) * | 2017-05-10 | 2020-03-24 | Oracle International Corporation | Utilizing discourse structure of noisy user-generated content for chatbot learning |
US11694037B2 (en) * | 2017-05-10 | 2023-07-04 | Oracle International Corporation | Enabling rhetorical analysis via the use of communicative discourse trees |
US20220318513A9 (en) * | 2017-05-10 | 2022-10-06 | Oracle International Corporation | Discourse parsing using semantic and syntactic relations |
US20220284194A1 (en) * | 2017-05-10 | 2022-09-08 | Oracle International Corporation | Using communicative discourse trees to detect distributed incompetence |
US11386274B2 (en) * | 2017-05-10 | 2022-07-12 | Oracle International Corporation | Using communicative discourse trees to detect distributed incompetence |
US20180365228A1 (en) * | 2017-06-15 | 2018-12-20 | Oracle International Corporation | Tree kernel learning for text classification into classes of intent |
US10839161B2 (en) * | 2017-06-15 | 2020-11-17 | Oracle International Corporation | Tree kernel learning for text classification into classes of intent |
US11100144B2 (en) | 2017-06-15 | 2021-08-24 | Oracle International Corporation | Data loss prevention system for cloud security based on document discourse analysis |
US11580144B2 (en) * | 2017-09-27 | 2023-02-14 | Oracle International Corporation | Search indexing using discourse trees |
US20220035845A1 (en) * | 2017-09-27 | 2022-02-03 | Oracle International Corporation | Search indexing using discourse trees |
US11182412B2 (en) * | 2017-09-27 | 2021-11-23 | Oracle International Corporation | Search indexing using discourse trees |
US10796099B2 (en) | 2017-09-28 | 2020-10-06 | Oracle International Corporation | Enabling autonomous agents to discriminate between questions and requests |
US10853574B2 (en) | 2017-09-28 | 2020-12-01 | Oracle International Corporation | Navigating electronic documents using domain discourse trees |
WO2019067869A1 (en) * | 2017-09-28 | 2019-04-04 | Oracle International Corporation | DETERMINING RHETORIC RELATIONSHIPS BETWEEN DOCUMENTS BASED ON THE ANALYSIS AND IDENTIFICATION OF NAMED ENTITIES |
US11599724B2 (en) | 2017-09-28 | 2023-03-07 | Oracle International Corporation | Enabling autonomous agents to discriminate between questions and requests |
US11797773B2 (en) | 2017-09-28 | 2023-10-24 | Oracle International Corporation | Navigating electronic documents using domain discourse trees |
US11809825B2 (en) | 2017-09-28 | 2023-11-07 | Oracle International Corporation | Management of a focused information sharing dialogue based on discourse trees |
US11295085B2 (en) | 2017-09-28 | 2022-04-05 | Oracle International Corporation | Navigating electronic documents using domain discourse trees |
US11321540B2 (en) | 2017-10-30 | 2022-05-03 | Sdl Inc. | Systems and methods of adaptive automated translation utilizing fine-grained alignment |
US10635863B2 (en) | 2017-10-30 | 2020-04-28 | Sdl Inc. | Fragment recall and adaptive automated translation |
US10817676B2 (en) | 2017-12-27 | 2020-10-27 | Sdl Inc. | Intelligent routing services and systems |
US11475227B2 (en) | 2017-12-27 | 2022-10-18 | Sdl Inc. | Intelligent routing services and systems |
US11816438B2 (en) | 2018-01-02 | 2023-11-14 | Narrative Science Inc. | Context saliency-based deictic parser for natural language processing |
US11042708B1 (en) | 2018-01-02 | 2021-06-22 | Narrative Science Inc. | Context saliency-based deictic parser for natural language generation |
US11042709B1 (en) | 2018-01-02 | 2021-06-22 | Narrative Science Inc. | Context saliency-based deictic parser for natural language processing |
US11023689B1 (en) | 2018-01-17 | 2021-06-01 | Narrative Science Inc. | Applied artificial intelligence technology for narrative generation using an invocable analysis service with analysis libraries |
US11561986B1 (en) | 2018-01-17 | 2023-01-24 | Narrative Science Inc. | Applied artificial intelligence technology for narrative generation using an invocable analysis service |
US10963649B1 (en) | 2018-01-17 | 2021-03-30 | Narrative Science Inc. | Applied artificial intelligence technology for narrative generation using an invocable analysis service and configuration-driven analytics |
US11003866B1 (en) | 2018-01-17 | 2021-05-11 | Narrative Science Inc. | Applied artificial intelligence technology for narrative generation using an invocable analysis service and data re-organization |
US11537645B2 (en) * | 2018-01-30 | 2022-12-27 | Oracle International Corporation | Building dialogue structure by using communicative discourse trees |
US11694040B2 (en) | 2018-01-30 | 2023-07-04 | Oracle International Corporation | Using communicative discourse trees to detect a request for an explanation |
US20230094841A1 (en) * | 2018-01-30 | 2023-03-30 | Oracle International Corporation | Building dialogue structure by using communicative discourse trees |
US10949623B2 (en) | 2018-01-30 | 2021-03-16 | Oracle International Corporation | Using communicative discourse trees to detect a request for an explanation |
US11782985B2 (en) | 2018-05-09 | 2023-10-10 | Oracle International Corporation | Constructing imaginary discourse trees to improve answering convergent questions |
WO2019217722A1 (en) * | 2018-05-09 | 2019-11-14 | Oracle International Corporation | Constructing imaginary discourse trees to improve answering convergent questions |
US11455494B2 (en) | 2018-05-30 | 2022-09-27 | Oracle International Corporation | Automated building of expanded datasets for training of autonomous agents |
US11334726B1 (en) | 2018-06-28 | 2022-05-17 | Narrative Science Inc. | Applied artificial intelligence technology for using natural language processing to train a natural language generation system with respect to date and number textual features |
US10706236B1 (en) * | 2018-06-28 | 2020-07-07 | Narrative Science Inc. | Applied artificial intelligence technology for using natural language processing and concept expression templates to train a natural language generation system |
US11232270B1 (en) | 2018-06-28 | 2022-01-25 | Narrative Science Inc. | Applied artificial intelligence technology for using natural language processing to train a natural language generation system with respect to numeric style features |
US11042713B1 (en) | 2018-06-28 | 2021-06-22 | Narrative Scienc Inc. | Applied artificial intelligence technology for using natural language processing to train a natural language generation system |
US11645459B2 (en) * | 2018-07-02 | 2023-05-09 | Oracle International Corporation | Social autonomous agent implementation using lattice queries and relevancy detection |
US11256867B2 (en) | 2018-10-09 | 2022-02-22 | Sdl Inc. | Systems and methods of machine learning for digital assets and message creation |
US11562135B2 (en) * | 2018-10-16 | 2023-01-24 | Oracle International Corporation | Constructing conclusive answers for autonomous agents |
US20200117709A1 (en) * | 2018-10-16 | 2020-04-16 | Oracle International Corporation | Constructing conclusive answers for autonomous agents |
US11720749B2 (en) | 2018-10-16 | 2023-08-08 | Oracle International Corporation | Constructing conclusive answers for autonomous agents |
US10990767B1 (en) | 2019-01-28 | 2021-04-27 | Narrative Science Inc. | Applied artificial intelligence technology for adaptive natural language understanding |
US11341330B1 (en) | 2019-01-28 | 2022-05-24 | Narrative Science Inc. | Applied artificial intelligence technology for adaptive natural language understanding with term discovery |
US20220222444A1 (en) * | 2019-02-13 | 2022-07-14 | Oracle International Corporation | Chatbot conducting a virtual social dialogue |
US11861319B2 (en) * | 2019-02-13 | 2024-01-02 | Oracle International Corporation | Chatbot conducting a virtual social dialogue |
US11321536B2 (en) * | 2019-02-13 | 2022-05-03 | Oracle International Corporation | Chatbot conducting a virtual social dialogue |
US11334720B2 (en) | 2019-04-17 | 2022-05-17 | International Business Machines Corporation | Machine learned sentence span inclusion judgments |
US11551008B2 (en) | 2019-04-28 | 2023-01-10 | Beijing Xiaomi Intelligent Technology Co., Ltd. | Method and device for text processing |
US11275892B2 (en) | 2019-04-29 | 2022-03-15 | International Business Machines Corporation | Traversal-based sentence span judgements |
US11263394B2 (en) * | 2019-08-02 | 2022-03-01 | Adobe Inc. | Low-resource sentence compression system |
US11449682B2 (en) | 2019-08-29 | 2022-09-20 | Oracle International Corporation | Adjusting chatbot conversation to user personality and mood |
US11556698B2 (en) * | 2019-10-22 | 2023-01-17 | Oracle International Corporation | Augmenting textual explanations with complete discourse trees |
US11580298B2 (en) | 2019-11-14 | 2023-02-14 | Oracle International Corporation | Detecting hypocrisy in text |
US11880652B2 (en) | 2019-11-14 | 2024-01-23 | Oracle International Corporation | Detecting hypocrisy in text |
US11501085B2 (en) | 2019-11-20 | 2022-11-15 | Oracle International Corporation | Employing abstract meaning representation to lay the last mile towards reading comprehension |
US11741316B2 (en) | 2019-11-20 | 2023-08-29 | Oracle International Corporation | Employing abstract meaning representation to lay the last mile towards reading comprehension |
US11775772B2 (en) | 2019-12-05 | 2023-10-03 | Oracle International Corporation | Chatbot providing a defeating reply |
US11847420B2 (en) | 2020-03-05 | 2023-12-19 | Oracle International Corporation | Conversational explainability |
US11074402B1 (en) * | 2020-04-07 | 2021-07-27 | International Business Machines Corporation | Linguistically consistent document annotation |
US11475210B2 (en) * | 2020-08-31 | 2022-10-18 | Twilio Inc. | Language model for abstractive summarization |
US11941348B2 (en) * | 2020-08-31 | 2024-03-26 | Twilio Inc. | Language model for abstractive summarization |
US20220414319A1 (en) * | 2020-08-31 | 2022-12-29 | Twilio Inc. | Language model for abstractive summarization |
US11765267B2 (en) | 2020-12-31 | 2023-09-19 | Twilio Inc. | Tool for annotating and reviewing audio conversations |
US11809804B2 (en) | 2021-05-26 | 2023-11-07 | Twilio Inc. | Text formatter |
Also Published As
Publication number | Publication date |
---|---|
US20020040292A1 (en) | 2002-04-04 |
AU2001261506A1 (en) | 2001-11-20 |
WO2001086491A3 (en) | 2003-08-14 |
CN1465018A (zh) | 2003-12-31 |
US7533013B2 (en) | 2009-05-12 |
WO2001086489A2 (en) | 2001-11-15 |
CA2408819C (en) | 2006-11-07 |
JP2004501429A (ja) | 2004-01-15 |
EP1352338A2 (en) | 2003-10-15 |
CA2408819A1 (en) | 2001-11-15 |
WO2001086491A2 (en) | 2001-11-15 |
WO2001086489A3 (en) | 2003-07-24 |
AU2001261505A1 (en) | 2001-11-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20020046018A1 (en) | Discourse parsing and summarization | |
Knight et al. | Summarization beyond sentence extraction: A probabilistic approach to sentence compression | |
Marcu | A decision-based approach to rhetorical parsing | |
Weiss et al. | Fundamentals of predictive text mining | |
Knight et al. | Statistics-based summarization-step one: Sentence compression | |
Harabagiu et al. | Topic themes for multi-document summarization | |
EP1899835B1 (en) | Processing collocation mistakes in documents | |
Zhang et al. | Keyword extraction using support vector machine | |
US7970600B2 (en) | Using a first natural language parser to train a second parser | |
US7584092B2 (en) | Unsupervised learning of paraphrase/translation alternations and selective application thereof | |
US6115683A (en) | Automatic essay scoring system using content-based techniques | |
US9430742B2 (en) | Method and apparatus for extracting entity names and their relations | |
Azmi et al. | A text summarizer for Arabic | |
US7398196B1 (en) | Method and apparatus for summarizing multiple documents using a subsumption model | |
Gulati et al. | A novel technique for multidocument Hindi text summarization | |
Alami et al. | Impact of stemming on Arabic text summarization | |
Tomar et al. | Probabilistic latent semantic analysis for unsupervised word sense disambiguation | |
Charoenpornsawat et al. | Automatic sentence break disambiguation for Thai | |
JP4143085B2 (ja) | 同義語獲得方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体 | |
Bahloul et al. | ArA* summarizer: An Arabic text summarization system based on subtopic segmentation and using an A* algorithm for reduction | |
Das et al. | Extracting emotion topics from blog sentences: use of voting from multi-engine supervised classifiers | |
Moghadam et al. | Comparative study of various Persian stemmers in the field of information retrieval | |
Guerram et al. | A domain independent approach for ontology semantic enrichment | |
Batawalaarachchi | Automated title generation in sinhala language | |
Riaz | Improving Search via Named Entity Recognition in Morphologically Rich Languages–A Case Study in Urdu |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SOUTHERN CALIFORNIA, UNIVERSITY OF, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MARCU, DANIEL;KNIGHT, KEVIN;REEL/FRAME:011804/0846 Effective date: 20010511 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |