EP2183684A2 - Coreference resolution in an ambiguity-sensitive natural language processing system - Google Patents
Coreference resolution in an ambiguity-sensitive natural language processing systemInfo
- Publication number
- EP2183684A2 EP2183684A2 EP08828084A EP08828084A EP2183684A2 EP 2183684 A2 EP2183684 A2 EP 2183684A2 EP 08828084 A EP08828084 A EP 08828084A EP 08828084 A EP08828084 A EP 08828084A EP 2183684 A2 EP2183684 A2 EP 2183684A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- coreference
- text
- fact
- computer
- ambiguity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- a second phrase can be an anaphor which is anaphoric to a first phrase.
- the first phrase is the antecedent of the second phrase.
- Knowledge of the referent of the antecedent may be necessary to determine the referent of the anaphor.
- the general task of finding coreferential expressions, anaphors, and their antecedents within a document can be referred to as coreference resolution.
- Coreference resolution is the process of establishing that two expressions refer to the same referent, without necessarily establishing what that referent is.
- Reference resolution is the process of establishing what the referent is. [0004]
- the expressions can be referred to as aliases of one another other. According to the example above, the expressions "Pablo Picasso,” “the Spanish painter,” “his,” “he,” and “Picasso” form an alias cluster referring to Picasso.
- Natural language expressions often display ambiguity. Ambiguity occurs when an expression can be interpreted with more then one meaning. For example, the sentence “The duck is ready to eat” can be interpreted as asserting either that the duck is properly cooked or that the duck is hungry and needs to be fed.
- Coreference resolution and ambiguity resolution are two examples of natural language processing operations that can be used to mechanically support language as commonly expressed by human users.
- Information processing systems such as text indexing and querying in support of information searching, may benefit from increased application of natural language processing systems.
- information provided by a coreference resolution system can be integrated into, and improve the performance of, a natural language processing system.
- a natural language processing system An example of such a system is a document indexing and retrieval system.
- ambiguity awareness features can operate in coordination with coreference resolution within a natural language processing system.
- Annotation of coreference entities, as well as ambiguous interpretations can be supported by in-line markup within text expressions or alternatively by external entity maps.
- facts can be extracted from text to be indexed. Information expressed within the text can be formally organized in terms of facts. Used in this sense, a fact can be any information contained in the text, and need not necessarily be true.
- a fact may be represented as a relationship between entities.
- a fact can be stored in a semantic index as a relationship between entities stored within the index.
- a document can be retrieved if it contains a fact that matches a fact determined through analysis of the query [0012]
- a process of expansion can support applying multiple aliases, or ambiguities, to an entity being indexed. Such expansion can support additional possible references, or interpretations, for a given entity being captured into the semantic index.
- Alternative stored descriptions can support retrieval of a fact by either the original description or a coreferential description.
- FIGURE 1 is a network architecture diagram illustrating an information search system according to aspects of an embodiment presented herein;
- FIGURE 2 is a functional block diagram illustrating various components of a natural language index and query system according to aspects of an embodiment presented herein;
- FIGURE 3 is a functional block diagram illustrating coreference resolution and ambiguity resolution within a natural language processing system according to aspects of an embodiment presented herein;
- FIGURE 4 is a logical flow diagram illustrating aspects of processes for ambiguity-sensitive indexing with coreference resolution according to aspects of an embodiment presented herein; and [0019]
- FIGURE 5 is a computer architecture diagram showing an illustrative computer hardware and software architecture for a computing system capable of implementing aspects of an embodiment presented herein. DETAILED DESCRIPTION
- coreference resolution functionality can be integrated into a natural language processing system that processes documents to be indexed for use in an information search and retrieval system. This integration can enhance the index with information supporting coreference resolution for natural language documents being indexed.
- coreference resolution functionality can be integrated into a natural language processing system that processes documents to be indexed for use in an information search and retrieval system. This integration can enhance the index with information supporting coreference resolution for natural language documents being indexed.
- program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types.
- a network architecture diagram 100 illustrates an information search system according to aspects of an embodiment presented herein.
- Client computers 11 OA- 11 OD can interface through a network 140 to a server 120 to obtain information associated with a natural language engine 130. While four client computers 11 OA-11OD are illustrated, it should be appreciated that any number of client computers 11 OA- HOD may be in use.
- the client computers 11 OA- 11 OD may be geographically distributed across a network 140, collocated, or any combination thereof.
- a single server 120 is illustrated, it should be appreciated that the functionality of the server 120 may be distributed over any number of multiple servers 120. Such multiple servers 120 may be collocated, geographically distributed across a network 140, or any combination thereof.
- the natural language engine 130 may support search engine functionality.
- a user query may be issued from a client computer 11 OA- HOD through the network 140 and on to the server 120.
- the user query may be in a natural language format.
- the natural language engine 130 may process the natural language query to support a search based upon syntax and semantics extracted from the natural language query. Results of such a search may be provided from the server 120 through the network 140 back to the client computers 11 OA- HOD.
- One or more search indexes may be stored at, or in association with, the server 120.
- Information in a search index may be populated from a set of source information, or a corpus.
- content may be collected and indexed from various web sites on various web servers (not illustrated) across the network 140.
- Such collection and indexing may be performed by software executing on the server 120, or on another computer (not illustrated).
- the collection may be performed by web crawlers or spider applications.
- the natural language engine 130 may be applied to the collected information such that natural language content collected from the corpus may be indexed based on syntax and semantics extracted by the natural language engine 130. Indexing and searching is discussed in further detail with respect to FIGURE 2.
- the client computers 11 OA- 11 OD may act as terminal clients, hypertext browser clients, graphical display clients, or other networked clients to the server 120.
- a web browser application at the client computers 11 OA- 11 OD may support interfacing with a web server application at the server 120.
- Such a browser may use controls, plug-ins, or applets to support interfacing to the server 120.
- the client computers 11 OA-11OD can also use other customized programs, applications, or modules to interface with the server 120.
- the client computers 1 lOA-11OD can be desktop computers, laptops, handhelds, mobile terminals, mobile telephones, television set-top boxes, kiosks, servers, terminals, thin-clients, or any other computerized devices.
- the network 140 may be any communications network capable of supporting communications between the client computers 11 OA- HOD and the server 120.
- the network 140 may be wired, wireless, optical, radio, packet switched, circuit switched, or any combination thereof.
- the network 140 may use any topology, and links of the network 140 may support any networking technology, protocol, or bandwidth such as Ethernet, DSL, cable modem, ATM, SONET, MPLS, PSTN, POTS modem, PONS, HFC, satellite, ISDN, WiFi, WiMax, mobile cellular, any combination thereof, or any other data interconnection or networking mechanism.
- the network 140 may be an intranet, an internet, the Internet, the World Wide Web, a LAN, a WAN, a MAN, or any other network for interconnection computers systems.
- the natural language engine 130 can be operated locally.
- a server 120 and a client computer 11 OA- 11 OD may be combined onto a single computing device.
- Such a combined system can support search indexes stored locally or remotely.
- FIGURE 2 a functional block diagram illustrates various components of a natural language engine 130 according to one exemplary embodiment.
- the natural language engine 130 can support information searches. In order to support such searches, a content acquisition process 200 is performed. Operations related to content acquisition 200 extract information from documents provided as text content 210. This information can be stored in a semantic index 250 that can be used for searching. Operations related to a user search 205 can support processing of a user entered search query. The user query can take the form of a natural language question 260. The natural language engine 130 can analyze the user input to translate a query into a representation to be compared with information represented within the semantic index 250.
- the content and structuring of information in the semantic index 250 can support rapid matching and retrieval of documents, or portions of documents, that are relevant to the meaning of the query or natural language question 260.
- the text content 210 may comprise documents in a very general sense. Examples of such documents can include web pages, textual documents, scanned documents, databases, information listings, other Internet content, or any other information source. This text content 210 can provide a corpus of information to be searched. Processing the text content 210 can occur in two stages as syntactic parsing 215 and semantic mapping 225. Preliminary language processing steps may occur before, or at the beginning of parsing 215. For example, the text content 210 may be separated at sentence boundaries.
- Parsing 215 may be performed by a syntactic analysis system, such as the Xerox Linguistic Environment (XLE), provided here only as a general example, but not to limit possible implementations of this description.
- the parser 215 can convert sentences to representations that make explicit the syntactic relations among words.
- the parser 215 can apply a grammar 220 associated with the specific language in use.
- the parser 215 can apply a grammar 220 for English.
- the grammar 220 may be formalized, for example, as a lexical functional grammar (LFG) or other suitable parsing mechanism such as those based on Head-Driven Phrase Structure Grammar (HPSG), Combinatory Categorial Grammar (CCG), Probabilistic Context-free Grammar (PCFG) or any other grammar formalism.
- the grammar 220 can specify possible ways for constructing meaningful sentences in a given language.
- the parser 215 may apply the rules of the grammar 220 to the strings of the text content 210.
- a grammar 220 may be provided for various languages. For example, LFG grammars have been created for English, French, German, Chinese, and Japanese. Other grammars may be provided as well.
- a grammar 220 may be developed by manual acquisition where grammatical rules are defined by a linguist or dictionary writer. Alternatively, machine learning acquisition can involve the automated observation and analysis of many examples of text from a large corpus to automatically determine grammatical rules. A combination of manual definition and machine learning may be also be used in acquiring the rules of a grammar 220.
- the parser 215 can apply the grammar 220 to the text content 210 to determine the syntactic structure.
- the syntactic structures consist of constituent structures (c-structures) and functional structures (f- structures).
- the c- structure can represent a hierarchy of constituent phrases and words.
- the f-structure can encode roles and relationships between the various constituents of the c-structure.
- the f- structure can also represent information derived from the forms of the words. For example, the plurality of a noun or the tense of a verb may be specified in the f-structure.
- semantic mapping process 225 that follows the parsing process 215, information can be extracted from the syntactic-structures and combined with information about the meanings of the words in the sentence.
- a semantic map or semantic representation of a sentence can be provided as content semantics 240.
- Semantic mapping 225 can augment the syntactic relationships provided by the parser 215 with conceptual properties of individual words. The results can be transformed into representations of the meaning of sentences from the text content 210.
- Semantic mapping 225 can determine roles played by words in a sentence. For example, the subject performing an action, something used to carry out the action, or something being affected by the action. For the purposes of search indexing, words can be stored in a semantic index 250 along with their roles.
- Semantic mapping 225 can support disambiguation of terms, determination of antecedent relationships, and expansion of terms by synonym, hypernym, or hyponym. [0013] Semantic mapping 225 can apply knowledge resources 230 as rules and techniques for extracting semantics from sentences. The knowledge resources can be acquired through both manual definition and machine learning, as discussed with respect to acquisition of grammars 220. The semantic mapping 225 process can provide content semantics 240 in a semantic extensible markup language (semantic XML or semxml) representation.
- Content semantics 240 can specify roles played by words in the sentences of the text content 210.
- the content semantics 240 can be provided to an indexing process 245.
- An index can support representing a large corpus of information so that the locations of words and phrases can be rapidly identified within the index.
- a traditional search engine may use keywords as search terms such that the index maps from keywords specified by a user to articles or documents where those keywords appear.
- the semantic index 250 can represent the semantic meanings of words in addition to the words themselves. Semantic relationships can be assigned to words during both content acquisition 200 and user search 205. Queries against the semantic index 250 can be based on not only words, but words in specific roles. The roles are those played by the word in the sentence or phrase as stored in the semantic index 250.
- the semantic index 250 can be considered an inverted index that is a rapidly searchable database whose entries are semantic words (i.e. word in a given role) with pointers to the documents, or web pages, on which those words occur.
- the semantic index 250 can support hybrid indexing. Such hybrid indexing can combine features and functions of both keyword indexing and semantic indexing.
- User entry of queries can be supported in the form of natural language questions 260.
- the query can be analyzed through a natural language pipeline similar, or identical, to that used in content acquisition 200. That is, the natural language question 260 can be processed by a parser 265 to extract syntactic structure. Following syntactic parsing 265, the natural language question 260 can be processed for semantic mapping 270.
- the semantic mapping 270 can provide question semantics 275 to be used in a retrieval process 280 against the semantic index 250 as discussed above.
- the retrieval process 280 can support hybrid index queries where both keyword index retrieval and semantic index retrieval may be provided alone or in combination.
- results of retrieval 280 from the semantic index 250 along with the question semantics 275 can inform a ranking process 285.
- Ranking can leverage both keyword and semantic information.
- the results obtained by retrieval 280 can be ordered by various metrics in an attempt to place the most desirable results closer to the top of the retrieved information to be provided to the user as a result presentation 290.
- FIGURE 3 a functional block diagram illustrates coreference resolution and ambiguity resolution within a natural language processing system 300 according to aspects of an embodiment presented herein.
- the natural language processing system 300 can support an information search engine for document indexing and retrieval. Such a natural language enabled search engine can expand the information stored within its index based upon linguistic analysis.
- the system may also support discovery of the intention within a user query by analyzing the query linguistically.
- the coreference resolution and ambiguity resolution features discussed here can operate in relation to the syntactic parsing 215, semantic mapping 225, and semantic indexing 245 as discussed with respect to FIGURE 2. Coreference resolution can be performed directly on the Text Content 210, or use information from parsing 215 or semantic mapping 225 operations.
- coreference resolution 320, 370 may be performed directly on a segmented document and also as part of semantic mapping 225. These two occurrences of coreference resolution 320, 370 may be merged or their information outputs may be merged. It should be appreciated that coreference resolution may also occur between syntactic parsing 215 and semantic mapping 225. Coreference resolution may also occur at any other stage within a natural language processing pipeline. There may be one, two, or more coreference resolution components, or stages, at various positions within the natural language processing system. Text content 210 can be analyzed for information to store into a semantic index 250. Searching can involve querying the semantic index 250 for desired information. [0025] Content segmentation 310 can be performed on documents making up the text content 210.
- the documents can be segmented for more efficient and potentially more accurate coreference resolution 320.
- Coreference resolution 320 can consider potential reference relationships across an entire document. For long documents, a great deal of time can be spent comparing distant expressions.
- content segmentation 310 of documents prior to coreference resolution 320 can substantially reduce the time used for processing.
- Content segmentation 310 can effectively reduce the amount of content text 210 that is explored in attempts at coreference resolution 320.
- Content segmentation 310 can provide information to semantic coreference resolution 370 to indicate when a new document segment begins. Such information may be provided as a segmentation signal 312 or by inserting mark-up into a content document segment. An external file containing meta-information or other mechanisms may be also be used.
- the structure of a document may be used to identify segment boundaries that reference relations are unlikely to cross.
- Document structure can be inferred either from explicit markup such as paragraph boundaries, chapters, or section headings.
- Document structure can also be discovered through linguistic processing. Segments that exceed a specified length may be further subdivided. The desired subdivision length may be expressed, for example, in terms of a number of sentences or a number of words.
- heuristic or statistical criteria may be applied. Such criteria may be specified as to tend to keep coreferences together while limiting the size of a segment to a predetermined maximum. Various other approaches for segmenting text content 210 documents may also be applied. Content segmentation 310 may also specify an entire document as one segment.
- Coreference resolution 320, 370 can be used to identify coreference and aliases within the content text 210. For example, when indexing the sentence "He painted Guernica,” it can be crucial to determine that "he” refers to Picasso. This is particularly so if fact-based retrieval is in use. Resolving the pronoun alias for Picasso can support indexing the fact that Picasso painted Guernica, rather than the less useful fact that some male individual "he” painted Guernica. Without this ability to identify and index the referent of the pronoun, it can be difficult, using a fact-based retrieval method, to retrieve the document in response to the query "Picasso painted.” The recall of the system can be improved when a document relevant to the query is returned that may not have otherwise been returned.
- Annotation 330 may be applied to text content 210 to support tracking entities and possible coreference relationships. Confidence values in resolution decisions may also be annotated or marked up within the text content 210. The resolution determinations can be recorded by adding explicit annotation marks to the text. For example, given the text, "John visited Mary. He met her in 2003.” An annotation 330 may be applied as, “[El :0.9 John] visited [E2:0.8 Mary]. [El :0.9 He] met [E2:0.8 her] in 2003.” Where the words "John” and "He” may be related as entity one El with a confidence value of 0.9. Similarly, the words “Mary” and “her” may be related as entity two E2 with a confidence value of 0.8. The confidence value can indicate a measure of the confidence in the coreference resolution 320 decision. Annotation can encode coreference decisions directly, or annotation can function as identifiers connecting relevant terms in the annotated text to additional information in stand aside annotation 325.
- Coreference resolution 320 decisions may be used as part of the process of constructing semantic mapping 225. Referring expressions used by the coreference resolution 320 system may be integrated into the input representation for the semantic mapping 225 by inline annotations within the text content 210. The references may also be provided separately in an external stand-aside entity map 325.
- the same sentence may appear multiple times in different contexts. These different contexts may provide different candidates for coreference resolution 320. Since syntactic parsing 215 can be computationally expensive, it may be useful to save parsing results for sentences in a cache. Such a caching mechanism 350 can support rapidly retrieving parse information when a sentence is encountered in the future. [0033] If coreference resolution 320 is applied to a single sentence appearing in different contexts, it may identify different coreference relationships for the same referring expressions since coreference may be dependent on context. Thus, different entity identifiers may be inserted inline to the text.
- the text "He is smart" appearing in two different documents may be annotated with two different identifiers, "[E21 He] is smart.” and "[E78 He] is smart.”
- the word "He” in a first document refers to a different person that the word "He” in a second document.
- There may be different sources of information for shallow coreference resolution 320 For example, in addition to the expression detection performed during coreference resolution 320, there may be a system dedicated to finding proper names in the text content 210. These different sources may identify conflicting resolution information. For example, a conflicting resolution may occur where boundaries cross. For instance, two systems might have identified the following conflicting referring expressions:
- the parsing component 215 can be an ambiguity aware parser support direct parsing of the ambiguous input where the syntactic parse 355 can preserve ambiguity.
- ambiguous input resolutions may need to be parsed separately, and multiple output structures may be passed to the semantic component 225 separately.
- Semantic processing 225 may be applied multiple times to each output of the syntactic parser 215. This may result in different semantic outputs for different syntactic inputs.
- semantic mapping 225 can combine the various inputs and process them in unison.
- Semantic mapping 225 can being with semantic normalization 360.
- Semantic normalization may also add information about the different words of the parsed sentence. For example, the words may be identified in a lexicon and associated with their synonyms, hypernyms, possible aliases, and other lexical information.
- Semantics based coreference resolution 370 may resolve expressions based upon syntactic and semantic information. For example, “John saw Bill. He greeted him.” may resolve “he” to "John” and “him” to “Bill.” This resolution may be assigned since "he” and "John” are both subjects, while “him” and “Bill” are both objects.
- Shallow coreference resolution 320 may function by inspecting a document segment where terms occur.
- semantic coreference resolution 370, or deep coreference resolution may process one sentence at a time. Possible antecedents of sentences may be placed into an antecedent store 375 so that semantic coreference resolution 370 of later sentences may access earlier introduced elements. Antecedents may be stored with information about their grammatical function and roles in the sentence, their distance in the text, information about their relationships with other antecedents, and various other pieces of information.
- Expression merging 380 can combine expressions from shallow coreference resolution 320, stand aside annotation 325, and information from semantic coreference resolution 370. Information for terms to be combined may be identified using string alignment or annotations 330. Other mechanisms for combining two annotations on the same text may also be used.
- Syntactic parsing 215 can be a natural point of integration for the optionally detected referring expressions.
- a parser can support inferring structure in sentences such as constituents, or grammatical relationships such as subject and object.
- An ambiguity- enabled syntactic parser 215 can identify multiple alternative structural representations of a sentence.
- information from coreference resolution 320 can be used to filter the output of the syntactic parser 215 by retaining only those representations in which the left boundary of each referring expression coincides with the beginning of a compatible part from the parse. For example, coreference resolution may establish coreferents as in, "[EO John] told [El George] [E2 Washington Irving] was a great writer.”
- the syntactic parser 215 may separately provide four parsing possibilities:
- a process of expansion 385 can add additional information to a representation. For example, for "John sold a car from Bill,” expansion 385 may additionally output the representation for "Bill bought a car from John.” Similarly, for “John killed Bill,” expansion 385 may additionally output the representation for "Bill died.”
- Documents may be ranked, in these traditional systems, according to factors such as how many of the terms from the query occur within the documents, how often the terms occur, or how close together the terms occur.
- the natural language processing system 300 can have different architectures.
- a pipeline may be provided where the information from one stage of language processing is passed as input to later stages. It should be appreciated that these approaches may be implemented with any other architecture operable to extracting the facts, to be indexed, from natural language text content 210.
- FIGURE 4 is a flow diagram illustrating aspects of processes 400 for ambiguity-sensitive indexing with coreference resolution according to aspects of an embodiment presented herein.
- the logical operations described herein are implemented (1) as a sequence of computer implemented acts or program modules running on a computing system and/or (2) as interconnected machine logic circuits or circuit modules within the computing system. The implementation is a matter of choice dependent on the performance and other requirements of the computing system. Accordingly, the logical operations described herein are referred to variously as state operations, structural devices, acts, or modules. These operations, structural devices, acts and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof. It should also be appreciated that more or fewer operations may be performed than shown in the figures and described herein. These operations may also be performed sequentially, in parallel, or in a different order than those described herein.
- the routine 400 begins at operation 410, where a portion of the text content
- the text content 210 can be retrieved for analysis and indexing.
- the text content 210 can be segmented to bound the areas of text over which resolution processing much search and analyze.
- the segmentation may be based on structure within the text, such as sentences, paragraphs, pages, chapters, or sections.
- the segmentation may also be based on numbers of words, number of sentences, or other metrics of space or complexity.
- coreferences can be resolved within the text content 210. Working with the boundaries established within operation 430, coreferences may be identified and matched. Alias clusters may be established. Surface structure may be used to provide "shallow" resolution. Ambiguities that arise during coreference resolution may be annotated.
- Such annotation 340 may be provided as mark-up within the text content 210 or through the use of an external entity map. Similar annotation may also be used to label the references and referents with entity numbers. Annotation may also be provided to indicate confidence levels of the established coreference resolutions.
- syntactic parsing may convert sentences to representations that make explicit the syntactic relations among words.
- a parser 215 can apply a grammar 220 associated with the specific language to provide syntactic parse 355 information.
- semantic representations can be extracted from the text content 210.
- Information expressed in document within the text content 210 may be formally organized in terms of representations of relationships between entities within the text. These relationships may be referred to as facts in a general sense.
- syntactic parse 355 information output from a syntactic parse 215 may be used to support deep coreference resolution 370. Semantic representations produced during operation 450 may also be leveraged.
- expressions from the shallow coreference resolution operation 430 may be integrated with information from the deep coreference resolution operation 455.
- An ambiguity-enabled syntactic parser 215 can identify multiple alternative structural representations of a sentence.
- Information from coreference resolution can be used to filter output of the syntactic parser 215.
- the semantics of the text content 210 can be expanded to include chosen implied representations.
- facts can be extracted from the semantic representations expressing relationships between entities, events and states of affairs within the content text.
- the facts and entities may be stored into the semantic index 250.
- the routine 400 can terminate after operation 480. However, it should be appreciated that the routine 400 may be applied repeatedly or continuously to retrieve text content 210 potions to be applied to the semantic index 250.
- FIGURE 5 an illustrative computer architecture 500 can execute software components described herein for coreference resolution in an ambiguity- sensitive natural language processing system.
- the computer architecture shown in FIGURE 5 illustrates a conventional desktop, laptop, or server computer and may be utilized to execute any aspects of the software components presented herein. It should be appreciated however, that the described software components can also be executed on other example computing environments, such as mobile devices, television, set-top boxes, kiosks, vehicular information systems, mobile telephones, embedded systems, or otherwise. Any one or more of the client computers 11 OA- 11 OD or sever computers 120 may be implemented as computer system 500 according to embodiments.
- the computer architecture illustrated in FIGURE 5 can include a central processing unit 10 (CPU), a system memory 13, including a random access memory 14 (RAM) and a read-only memory 16 (ROM), and a system bus 11 that can couple the system memory 13 to the CPU 10.
- CPU central processing unit
- RAM random access memory
- ROM read-only memory
- the computer 500 may further include a mass storage device 15 for storing an operating system 18, software, data, and various program modules, such as those associated with the natural language engine 130.
- the natural language engine 130 can execute portions of software components described herein.
- a semantic index 250 associated with the natural language engine 130 may be stored within the mass storage device 15.
- the mass storage device 15 can be connected to the CPU 10 through a mass storage controller (not illustrated) connected to the bus 11.
- the mass storage device 15 and its associated computer-readable media can provide non- volatile storage for the computer 500.
- computer-readable media can be any available computer storage media that can be accessed by the computer 500.
- computer-readable media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
- computer- readable media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, digital versatile disks (DVD), HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer 500.
- the computer 500 may operate in a networked environment using logical connections to remote computers through a network such as the network 140.
- the computer 500 may connect to the network 140 through a network interface unit 19 connected to the bus 11. It should be appreciated that the network interface unit 19 may also be utilized to connect to other types of networks and remote computer systems.
- the computer 500 may also include an input/output controller 12 for receiving and processing input from a number of other devices, including a keyboard, mouse, or electronic stylus (not illustrated). Similarly, an input/output controller 12 may provide output to a video display, a printer, or other type of output device (also not illustrated).
- a number of program modules and data files may be stored in the mass storage device 15 and RAM 14 of the computer 500, including an operating system 18 suitable for controlling the operation of a networked desktop, laptop, server computer, or other computing environment.
- the mass storage device 15, ROM 16, and RAM 14 may also store one or more program modules.
- the mass storage device 15, the ROM 16, and the RAM 14 may store the natural language engine 130 for execution by the CPU 10.
- the natural language engine 130 can include software components for implementing portions of the processes discussed in detail with respect to FIGURES 2-4.
- the mass storage device 15, the ROM 16, and the RAM 14 may also store other types of program modules.
- the mass storage device 15, the ROM 16, and the RAM 14 can also store a semantic index 250 associated with the natural language engine 130.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Document Processing Apparatus (AREA)
Abstract
Description
Claims
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US96942607P | 2007-08-31 | 2007-08-31 | |
US96948307P | 2007-08-31 | 2007-08-31 | |
PCT/US2008/074935 WO2009029903A2 (en) | 2007-08-31 | 2008-08-29 | Coreference resolution in an ambiguity-sensitive natural language processing system |
US12/200,962 US8712758B2 (en) | 2007-08-31 | 2008-08-29 | Coreference resolution in an ambiguity-sensitive natural language processing system |
Publications (2)
Publication Number | Publication Date |
---|---|
EP2183684A2 true EP2183684A2 (en) | 2010-05-12 |
EP2183684A4 EP2183684A4 (en) | 2017-10-18 |
Family
ID=42041476
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP08828084.7A Ceased EP2183684A4 (en) | 2007-08-31 | 2008-08-29 | Coreference resolution in an ambiguity-sensitive natural language processing system |
Country Status (11)
Country | Link |
---|---|
EP (1) | EP2183684A4 (en) |
JP (2) | JP2010538374A (en) |
KR (1) | KR101522049B1 (en) |
CN (1) | CN101796508B (en) |
AU (1) | AU2008292779B2 (en) |
BR (1) | BRPI0815826A2 (en) |
CA (1) | CA2698054C (en) |
MX (1) | MX2010002349A (en) |
RU (1) | RU2480822C2 (en) |
WO (1) | WO2009029903A2 (en) |
ZA (1) | ZA201001259B (en) |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2643438C2 (en) * | 2013-12-25 | 2018-02-01 | Общество с ограниченной ответственностью "Аби Продакшн" | Detection of linguistic ambiguity in a text |
RU2563148C2 (en) * | 2013-07-15 | 2015-09-20 | Общество с ограниченной ответственностью "Аби ИнфоПоиск" | System and method for semantic search |
JP5699789B2 (en) * | 2011-05-10 | 2015-04-15 | ソニー株式会社 | Information processing apparatus, information processing method, program, and information processing system |
US9286291B2 (en) * | 2013-02-15 | 2016-03-15 | International Business Machines Corporation | Disambiguation of dependent referring expression in natural language processing |
CN104462053B (en) * | 2013-09-22 | 2018-10-12 | 江苏金鸽网络科技有限公司 | A kind of personal pronoun reference resolution method based on semantic feature in text |
US9606977B2 (en) * | 2014-01-22 | 2017-03-28 | Google Inc. | Identifying tasks in messages |
US9497153B2 (en) * | 2014-01-30 | 2016-11-15 | Google Inc. | Associating a segment of an electronic message with one or more segment addressees |
CN109101533B (en) * | 2014-05-12 | 2022-07-15 | 谷歌有限责任公司 | Automated reading comprehension |
CA2959651C (en) * | 2014-09-03 | 2021-04-20 | The Dun & Bradstreet Corporation | System and process for analyzing, qualifying and ingesting sources of unstructured data via empirical attribution |
RU2591175C1 (en) * | 2015-03-19 | 2016-07-10 | Общество с ограниченной ответственностью "Аби ИнфоПоиск" | Method and system for global identification in collection of documents |
CN106815215B (en) * | 2015-11-30 | 2019-11-26 | 华为技术有限公司 | The method and apparatus for generating annotation repository |
CN107515851B (en) * | 2016-06-16 | 2021-09-10 | 佳能株式会社 | Apparatus and method for coreference resolution, information extraction and similar document retrieval |
JP7135399B2 (en) * | 2018-04-12 | 2022-09-13 | 富士通株式会社 | Specific program, specific method and information processing device |
JP7503000B2 (en) * | 2018-06-25 | 2024-06-19 | セールスフォース インコーポレイテッド | System and method for investigating relationships between entities - Patents.com |
US20200074322A1 (en) * | 2018-09-04 | 2020-03-05 | Rovi Guides, Inc. | Methods and systems for using machine-learning extracts and semantic graphs to create structured data to drive search, recommendation, and discovery |
CN109815482B (en) * | 2018-12-17 | 2023-05-23 | 北京百度网讯科技有限公司 | News interaction method, device, equipment and computer storage medium |
US11630953B2 (en) * | 2019-07-25 | 2023-04-18 | Baidu Usa Llc | Systems and methods for end-to-end deep reinforcement learning based coreference resolution |
US11151321B2 (en) * | 2019-12-10 | 2021-10-19 | International Business Machines Corporation | Anaphora resolution |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0268661A (en) * | 1988-09-05 | 1990-03-08 | Agency Of Ind Science & Technol | Context comprehending device |
RU2096824C1 (en) * | 1996-04-29 | 1997-11-20 | Государственный научно-технический центр гиперинформационных технологий | Method for automatic processing of information for personal use |
JPH1011462A (en) * | 1996-06-26 | 1998-01-16 | Fuji Xerox Co Ltd | Similar relation development dictionary, similarity evaluating device, and retrieval device |
JP3504439B2 (en) * | 1996-07-25 | 2004-03-08 | 日本電信電話株式会社 | Video search method |
US6185592B1 (en) * | 1997-11-18 | 2001-02-06 | Apple Computer, Inc. | Summarizing text documents by resolving co-referentiality among actors or objects around which a story unfolds |
JPH11282844A (en) * | 1998-03-26 | 1999-10-15 | Toshiba Corp | Preparing method of document, information processor and recording medium |
CA2419105C (en) * | 2002-02-20 | 2007-01-09 | Xerox Corporation | Generating with lexical functional grammars |
US20050108630A1 (en) * | 2003-11-19 | 2005-05-19 | Wasson Mark D. | Extraction of facts from text |
US20050149499A1 (en) * | 2003-12-30 | 2005-07-07 | Google Inc., A Delaware Corporation | Systems and methods for improving search quality |
US7401077B2 (en) * | 2004-12-21 | 2008-07-15 | Palo Alto Research Center Incorporated | Systems and methods for using and constructing user-interest sensitive indicators of search results |
JP4439431B2 (en) * | 2005-05-25 | 2010-03-24 | 株式会社東芝 | Communication support device, communication support method, and communication support program |
JP4654780B2 (en) * | 2005-06-10 | 2011-03-23 | 富士ゼロックス株式会社 | Question answering system, data retrieval method, and computer program |
US8060357B2 (en) * | 2006-01-27 | 2011-11-15 | Xerox Corporation | Linguistic user interface |
-
2008
- 2008-08-29 EP EP08828084.7A patent/EP2183684A4/en not_active Ceased
- 2008-08-29 BR BRPI0815826-6A2A patent/BRPI0815826A2/en not_active IP Right Cessation
- 2008-08-29 RU RU2010107148/08A patent/RU2480822C2/en not_active IP Right Cessation
- 2008-08-29 CA CA2698054A patent/CA2698054C/en not_active Expired - Fee Related
- 2008-08-29 WO PCT/US2008/074935 patent/WO2009029903A2/en active Application Filing
- 2008-08-29 MX MX2010002349A patent/MX2010002349A/en not_active Application Discontinuation
- 2008-08-29 KR KR1020107006475A patent/KR101522049B1/en not_active IP Right Cessation
- 2008-08-29 AU AU2008292779A patent/AU2008292779B2/en not_active Ceased
- 2008-08-29 CN CN200880105563XA patent/CN101796508B/en active Active
- 2008-08-29 JP JP2010523185A patent/JP2010538374A/en active Pending
-
2010
- 2010-02-22 ZA ZA2010/01259A patent/ZA201001259B/en unknown
-
2014
- 2014-07-31 JP JP2014156393A patent/JP2014238865A/en active Pending
Non-Patent Citations (1)
Title |
---|
See references of WO2009029903A2 * |
Also Published As
Publication number | Publication date |
---|---|
JP2014238865A (en) | 2014-12-18 |
KR101522049B1 (en) | 2015-05-20 |
RU2480822C2 (en) | 2013-04-27 |
CA2698054C (en) | 2015-12-22 |
CN101796508A (en) | 2010-08-04 |
EP2183684A4 (en) | 2017-10-18 |
CA2698054A1 (en) | 2009-03-05 |
CN101796508B (en) | 2013-03-06 |
RU2010107148A (en) | 2011-09-10 |
BRPI0815826A2 (en) | 2015-02-18 |
MX2010002349A (en) | 2010-07-30 |
AU2008292779A1 (en) | 2009-03-05 |
AU2008292779B2 (en) | 2012-09-06 |
ZA201001259B (en) | 2012-05-30 |
KR20100075451A (en) | 2010-07-02 |
WO2009029903A2 (en) | 2009-03-05 |
WO2009029903A3 (en) | 2009-05-07 |
JP2010538374A (en) | 2010-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8712758B2 (en) | Coreference resolution in an ambiguity-sensitive natural language processing system | |
CA2698054C (en) | Coreference resolution in an ambiguity-sensitive natural language processing system | |
US8041697B2 (en) | Semi-automatic example-based induction of semantic translation rules to support natural language search | |
CN109271626B (en) | Text semantic analysis method | |
US8463593B2 (en) | Natural language hypernym weighting for word sense disambiguation | |
US11080295B2 (en) | Collecting, organizing, and searching knowledge about a dataset | |
US9760570B2 (en) | Finding and disambiguating references to entities on web pages | |
US9448995B2 (en) | Method and device for performing natural language searches | |
US10296584B2 (en) | Semantic textual analysis | |
CN102253930B (en) | A kind of method of text translation and device | |
US8280721B2 (en) | Efficiently representing word sense probabilities | |
KR101500617B1 (en) | Method and system for Context-sensitive Spelling Correction Rules using Korean WordNet | |
US8204736B2 (en) | Access to multilingual textual resources | |
JP2011118689A (en) | Retrieval method and system | |
Moncla et al. | Automated geoparsing of paris street names in 19th century novels | |
KR20120064559A (en) | Apparatus and method for question analysis for open web question-answering | |
US8229970B2 (en) | Efficient storage and retrieval of posting lists | |
Garrido et al. | GEO-NASS: A semantic tagging experience from geographical data on the media | |
Yunus et al. | Semantic method for query translation. | |
RU2563148C2 (en) | System and method for semantic search | |
Klang et al. | Linking, searching, and visualizing entities in wikipedia | |
Tran et al. | A model of vietnamese person named entity question answering system | |
CN113918804A (en) | Commodity information retrieval system and method | |
RU2618375C2 (en) | Expanding of information search possibility | |
Milić-Frayling | Text processing and information retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20100223 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MT NL NO PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA MK RS |
|
DAX | Request for extension of the european patent (deleted) | ||
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: ZHIGU HOLDINGS LIMITED |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 20170914 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06F 17/27 20060101ALI20170908BHEP Ipc: G06F 17/20 20060101AFI20170908BHEP Ipc: G06F 17/30 20060101ALI20170908BHEP |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: ZHIGU HOLDINGS LIMITED |
|
17Q | First examination report despatched |
Effective date: 20181113 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R003 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20200207 |