US20140046653A1 - Method and system for building entity hierarchy from big data - Google Patents

Method and system for building entity hierarchy from big data Download PDF

Info

Publication number
US20140046653A1
US20140046653A1 US13/755,069 US201313755069A US2014046653A1 US 20140046653 A1 US20140046653 A1 US 20140046653A1 US 201313755069 A US201313755069 A US 201313755069A US 2014046653 A1 US2014046653 A1 US 2014046653A1
Authority
US
United States
Prior art keywords
entity
entities
parent
data
based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/755,069
Inventor
Sridhar Gopalakrishnan
Sujatha Raviprasad Upadhyaya
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XURMO TECHNOLOGIES PVT Ltd
Original Assignee
Xurmo Technologies Pvt. Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to IN3286/CHE/2012 priority Critical
Priority to IN3286CH2012 priority
Application filed by Xurmo Technologies Pvt. Ltd. filed Critical Xurmo Technologies Pvt. Ltd.
Publication of US20140046653A1 publication Critical patent/US20140046653A1/en
Application status is Abandoned legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • G06F17/2775Phrasal analysis, e.g. finite state techniques, chunking
    • G06F17/278Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems using knowledge-based models
    • G06N5/02Knowledge representation

Abstract

The various embodiments herein provide a method and a system for building an entity hierarchy. The method comprises extracting a plurality of entities from a bin data, determining a parent entity by understanding a context in which the entity is used, resolving the entities by bringing the synonymous entities together and holding the polysemous entities apart based on a semantic context and a syntactic context and building a hierarchical structure of entities using knowledge repositories, ontologies and language repositories along with natural language processing techniques. The method of extracting entities from the structured data comprises identifying each data point as an entity and identifying entities based on a relationship defined with other entities. The method of extracting entities from unstructured data includes a self-learning process and training based learning process to learn new parent entities from domain specific documents using new entity recognition models.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • The present application claims priority of Indian provisional application serial number 3286/CHE/2012 filed on Aug. 10, 2012, and that application is incorporated in its entirety at least by reference.
  • BACKGROUND
  • 1. Technical Field
  • The embodiments herein generally relate to data mining and particularly relates to extracting and resolving entities from a large collection of data. The embodiments herein more particularly relates to a method and system for extracting entities from big data and building an entity hierarchy using language and domain models.
  • 2 Description of the Related Art
  • A big data is a large collection, of information which derives its data content from plurality of structured, unstructured and semi-structured data sources. The big data requires a paradigm shift in the way the data is looked at in the past. The data cannot anymore reside in pockets and not talk each other. It is imperative that all of it to be considered as one and then be processed. Recognizing the entities and the relationships they share is a first step toward understanding data. Entity extraction and entity type or parent entity recognition are the building blocks of analyzing big data. Therefore, it is imperative that entity extraction and recognition should be done with least manual intervention and hence, a self learning procedure is required.
  • An entity is an atomic unit of data which has an independent self-explanatory meaning, and is also referred as an object that makes an independent sense. Entities could be named and unnamed or concepts, and include names of living and non living things, concepts, theories or simply the language units that make independent sense. In a database context, entities and relationships help in structurally storing the contents of a big data.
  • Entity extraction means processing data to identify, tag and properly account for those elements that are the names of person, numbers, organizations, locations, and expressions such as a telephone number, among other items. An entity can consist of a single word or a bound sequence of words. The challenge of figuring out entities is tough one for several reasons as many entities exist only in richly varied forms.
  • Many researches have been conducted for finding and identifying entities in a data. An existing system discusses about extraction of named entities only. Therefore the current systems are limited by the relationships that exist between named entities and never consider the relationship between concepts or a concept and a named entity. The existing literature does suggest building an entity hierarchy but limits itself to entity extraction and resolution.
  • The existing data analysis and information extraction techniques are usually designed to target at a particular media type and not applicable to data generated by a different media type. For example, existing entity extraction techniques focus on textual data. Entities of interest, such as protein and gene names, chemical names and formulae, drug names etc., are automatically extracted from the textual part of a document.
  • The existing extraction tools merely identity and extract information based on pre-specified relations and relation-specific human-tagged examples. The existing literatures do not refer to the self-learning capabilities of entity extractors. Further, the existing literature does not bring in domain ontologies and knowledge bases for semantic resolution in the context of entity extraction.
  • Accordingly, there is a need for an entity extraction method and system which is robust enough to identify new entities from big data. There is also a need for a method and system for categorizing entities in a hierarchical order to efficiently handle pattern query. Further there is also a need for a method and system for extracting entities from various data sources irrespective of the domain.
  • The above mentioned shortcomings, disadvantages and problems are addressed herein and which will be understood by reading and studying the following specification.
  • SUMMARY
  • The primary object of the embodiments herein is to provide a method and system for building entity hierarchy from a collection of structured, unstructured and semi structured data.
  • Another object of the embodiments herein is to provide a method and system for extracting a plurality of entities by analyzing big data.
  • Another object of the embodiments herein is to provide a method and system for facilitating an accurate and efficient pattern query relating to entities.
  • Another object of the embodiments herein is to provide a method and system for extracting named and unnamed entities from a collection of structured, semi-structured and unstructured data in a self learning manner.
  • Another object of the embodiments herein is to provide a method and system for extracting entities and building entity hierarchy from extracted entities with least manual intervention.
  • Another object of the embodiments herein is to provide a method and system for extracting entities which is domain independent.
  • These and other objects and advantages of the present embodiments will become readily apparent from the following detailed description taken in conjunction with the accompanying drawings.
  • The various embodiments herein provide a method for building an entity hierarchy. The method comprises extracting a plurality of entities from a big data, determining its parent entity or the entity type by understanding a context in which the entity is used, resolving the entities by bringing the synonymous entities together and holding the polysemous entities apart based on a semantic context and a syntactic context and building a hierarchical structure of entities using knowledge repositories, ontologies and language repositories along with natural language processing techniques.
  • According to an embodiment herein, the big data comprises structured, semi structured and unstructured data.
  • According to an embodiment herein, each entity is associated with a parent entity.
  • According to an embodiment herein, the entity is at least one of named entities and unnamed entities. The named entities belong to one of the parent entities and include names of person, organization, locations, time expressions, quantities, money values quantities, monetary values and the like. The unnamed entities include nouns, verbs, combinations of nouns and verbs, a concept or a language unit with independent meaning. The unnamed entities belong to the parent entity concept, however, there can be hierarchy among the various concept entities.
  • According to an embodiment herein, extracting the plurality of entities from the structured data comprises identifying each data point as an entity and identifying entities based on a relationship defined with other entities. The data point classes at least one of a table entity, a value entity, an attribute entity, and a database entity.
  • According to an embodiment herein, extracting, the plurality of entities from unstructured data comprises recognizing the named entities and the unnamed entities from data sources using a natural language processor based entity tagger, passing the named entities and unnamed entities through multiple entity recognition models, determining the parent entity and storing the entities along with respective parent entity and context specific information in an entity store.
  • According to an embodiment herein the entity extraction from unstructured data is a combination of a self-learning process and training based learning process.
  • According to an embodiment herein the self-learning entity extraction process comprises performing entity recognition by tagging entities without explicitly knowing the parent entity using a natural language processing technique, passing the tagged entities through trained Entity Recognition (ER) models to learn the parent entity associated with the tagged entity, detecting the parent entity using a voting, procedure, and storing the entities whose parent entities are detected in the entity store.
  • According to an embodiment herein, the self-learning entity extraction process further comprises feeding the data containing the entities whose parent entities are not detected to a Natural Language Processor (NLP) based entity detector which involves parsing documents containing domain specific knowledge and learn the parent entities from the explicit or implicit facts stated in the documents, building new entity recognition models, passing the entities through multiple entity recognition models until the parent entity is obtained and populating the entity recognition models with now entity recognition models built by learning from samples containing new entities whose parent entities are identified in domain specific documents through the NLP based entity detectors.
  • According to an embodiment herein, the training based entity extraction process comprises passing the data containing the tagged entities through multiple trained entity recognition models, determining one or more parent entities associated with the entities, and recognizing the appropriate parent entity based on a voting procedure.
  • According, to an embodiment herein, the training based entity extraction process further comprises providing additional training samples and documents that are tagged with new domain specific entities, and populating the training sample with new kind of parent entities suggested by the NLP based entity detectors to build new entity recognition models.
  • According to an embodiment herein, the entity recognition models to detect the parent entity use at least one of, maximum entropy model (maxent), conditional random fields (CRF), classification and clustering techniques, and NLP based techniques.
  • According to an embodiment herein, resolving the plurality of entities comprises at least one of a, word sense disambiguation technique, contextual resolution technique, syntactic similarity, and semantic similarity.
  • According to an embodiment herein, the entity extraction from semi-structured data is a combination of extracting entities from structured data and unstructured data.
  • Embodiments herein further provide a system for building an entity hierarchy. The system comprises an entity extractor to extract a plurality of entities from a big data, a Language and Domain model to conceptualize the entities in accordance with a structured context or semi-structured context, an entity resolver to resolve the entities by gathering the synonymous entities together and polysemous entities apart based on syntactic and semantic contexts, and an entity hierarchy builder to build a hierarchical structure of entities using natural language processing techniques in conjunction with the Language and Domain models.
  • According to an embodiment herein, the entity extractor comprises an entity tagger to tag named entities and unnamed entities in a data source and a parent entity detector to determine assertions of parent entity in data sources. The entities are passed through multiple entity recognition models to determine the parent entity based on a voting procedure.
  • According to an embodiment herein, the entity recognition models to detect the parent entity use at least one of a maximum entropy model (maxent), conditional random fields (CRF), classification and clustering techniques and NLP based techniques.
  • According to an embodiment herein, the entity tagger is adapted to tag, the named entities and the unnamed entities, and tau the named entities with explicit mention of the parent entity.
  • According to an embodiment herein, the entity resolver understands the context in which the entities are being used and determine the parent entity.
  • According to an embodiment herein, the entity resolver performs a contextual resolution using the Language and Domain models which comprise at least one of a, language repositories, domain ontologies, and knowledge repositories in combination with Natural Language Processing (NLP) techniques.
  • These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following, description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications can be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The other objects, features and advantages will occur to those skilled in the art from the following description of the preferred embodiment and the accompanying drawings in which:
  • FIG. 1 illustrates a block diagram of a system for building entity hierarchy from big data, according to an embodiment of the present disclosure.
  • FIG. 2 is a block diagram illustrating a self-learning entity extraction process, according to an embodiment of the present disclosure.
  • FIG. 3 is a block diagram illustrating a training based entity extraction process, according to an embodiment of the present disclosure.
  • FIG. 4 is a block diagram illustrating a process for resolving entities, according to an embodiment of the present disclosure.
  • FIG. 5 is a flow chart illustrating a method for building entity hierarchy, according to an embodiment of the present disclosure.
  • FIG. 6 is a flow chart illustrating a method for extracting entities from structured data sources, according to an embodiment of the present disclosure.
  • FIG. 7 is a flow chart illustrating a method for extracting entities from unstructured data sources through a self-learning entity extraction process, according to an embodiment of the present disclosure.
  • FIG. 8 is a flow chart illustrating a method for extracting entities from unstructured data sources through a training based entity extraction process, according to an embodiment of the present disclosure.
  • Although the specific features of the present embodiments are shown in some drawings and not in others. This is done for convenience only as each feature can be combined with any or all of the other features in accordance with the present embodiments.
  • DETAILED DESCRIPTION OF THE EMBODIMENTS
  • In the following detailed description, a reference is made to the accompanying drawings that form a part hereof, and in which the specific embodiments that can be practiced is shown by way of illustration. These embodiments are described in sufficient detail to enable those skilled in art to practice the embodiments and it is to be understood that the logical, mechanical and other changes can be made without departing from the scope of the embodiments. The following detailed description is therefore not to be taken in a limiting sense.
  • The various embodiments herein provide a method or building an entity hierarchy, the method comprises extracting a plurality of entities from a big data determining a parent entity by understanding a context in which the entity is used resolving the entities by bringing the synonymous entities together and holding the polysemous entities apart based on a semantic context and a syntactic context, and building a hierarchical structure of entities using knowledge repositories, ontologies, and language repositories along with natural language processing techniques. The big data comprises structured, semi-structured and unstructured data.
  • The entity is at least one of named entities and unnamed entities where each entity is associated with a parent emit. The named entities belong to one of the parent entities and include names of person, organization, locations, time expressions, quantifies, money values quantities, monetary values and the like. The unnamed entities include nouns, verbs, combinations of nouns and verbs, a concept or a language unit with independent meaning.
  • The plurality of entities are extracted from the structured data by identifying each data point as an entity and identifying entities based on a relationship defined with other entities. The data point classes at least one of a table entity, a value entity, an attribute entity, and a database entity.
  • The plurality of entities are extracted from the unstructured data by recognizing the named entities and the unnamed entities from data sources using a natural language processor based entity tagger, passing the named entities and unnamed entities through multiple entity recognition models determining the parent emits and storing the entities along with respective parent entity and context specific information in an entry store.
  • The entity extraction process from unstructured data herein is a combination of a self-learning process and training based learning process.
  • The self-learning entity extraction process comprises performing entity recognition by tagging entities without explicitly knowing the parent entity using a natural language processing technique, passing the tagged entities through trained Entity Recognition (ER) models to learn the parent entity associated with the tagged entity, detecting the parent entity using a voting procedure, and storing the entities whose parent entities are detected in the entity store.
  • The self-learning entity extraction process further comprises feeding the data containing the entities whose parent entities are not detected to a Natural Language Processor (NLP) based entity detector which involves parsing documents containing domain specific knowledge and learn the parent entities from the explicit, or implicit facts stated in the documents, building new entity recognition models, passing the entities through multiple entity recognition models until the parent entity is obtained and populating the entity recognition models with new entity recognition models built by learning from samples containing new entities whose parent entities are identified in domain specific documents through the NLP based entity detectors.
  • The training based entity extraction process comprises passing the data containing the tagged entities through multiple trained entity recognition models, determining one or more parent entities associated with the entities, and recognizing the appropriate parent entity based on a voting procedure.
  • The training based entity extraction process further comprises, providing additional training samples and documents that are tagged with new domain specific entities, and populating the training sample with new kind of parent entities suggested by the NLP based entity detectors to build new entity recognition models.
  • The entity recognition models to detect the parent entity use at least one of, maximum entropy model (maxent), conditional random fields (CRF), classification and clustering techniques and NLP based techniques.
  • The embodiments herein use at least one of a word sense disambiguation technique, contextual resolution technique, syntactic similarity, and semantic similarity method for resolving the plurality of entities.
  • The entity extraction from semi-structured data is a combination of extracting entities from structured data and unstructured data.
  • The system for building an entity hierarchy comprises an entity extractor to extract a plurality of entities from a big data, a Language and Domain model to conceptualize the entities in accordance with a structured context or semi-structured context, an entity resolver to resolve the entities by gathering the synonymous entities together and polysemous entities apart based on syntactic and semantic contexts, and an entity hierarchy builder to build a hierarchical structure of entities using natural language processing techniques in conjunction with the Language and Domain models.
  • The entity extractor comprises an entity tagger to tag named entities and unnamed entities in a data source, and a parent entity detector to determine assertions of parent entity in data sources. The entities are passed through multiple entity recognition models to determine the parent entity based on a voting procedure.
  • The entity recognition models to detect the parent entity use at least one of, maximum entropy model (maxent), conditional random fields (CRF), classification and clustering techniques, and NLP based techniques.
  • The entity tagger is adapted to tag the named entities and the unnamed entities, and tag the named entities with explicit mention of the parent entity.
  • The entity resolver understands the context in which the entities are being used and determine the parent entity.
  • The entity resolver performs a contextual resolution using the Language and Domain models which comprise at least one of a, language repositories, domain ontologies, and knowledge repositories in combination with Natural Language Processing (NLP) techniques.
  • FIG. 1 illustrates a block diagram of a system for building entity hierarchy, according to an embodiment of the present disclosure. The system comprises an entity hierarchy builder 101, a Language and Domain model 102, an entity resolver 103 and an entity extractor 104. The entity extractor 104 assists in processing and extracting entities from big data. The entity extractor 104 represents an input to the system in all forms of data including structured, semi-structured and unstructured data from heterogeneous sources. The entity extractor 104 continues to learn from the available data through different earning algorithms.
  • The entity extractor 104 comprises an entity tagger 105 and a parent entity detector 106. The entity tagger 105 tags named entities and unnamed entities in a data source and the parent entity detector 106 determines assertions of parent entity in data sources. The parent entity detector 106 passes the entities through multiple entity recognition models 107 to determine the parent entity based on a voting procedure.
  • The entity recognition models 107 herein use at least one of a maximum entropy model (maxent) conditional random fields (CRF), classification and clustering techniques and NLP based technique to detect the parent entity.
  • The Language and Domain model 102 is a repository used to understand the context in which the entity is being used and determines a parent entity/entity type of the entity. The Language and Domain model 102 comprises one or more language repositories 102 a, a domain ontologies 102 b and knowledge repositories 102 c. The Language and Domain model 102 is also used to resolve the entities in structured and semi-structured context.
  • The entity resolver 103 resolves the entities by gathering the synonymous entities together and polysemous entities apart based on syntactic and semantic contexts. The entity resolution strategies are based on resolving the syntactic and semantic context. The entity resolver uses standard domain ontologies, knowledge repositories, language repositories and natural language processing techniques to establish resolution.
  • The entity hierarchy builder 101 arranges and stores the plurality of entities in a hierarchical manner by using a plurality of Natural Language Processing (NLP) techniques with the support of Language and Domain model 102.
  • FIG. 2 is a block diagram illustrating a self-learning entity extraction process, according to an embodiment of the present disclosure. The named and unnamed entities are recognized in every source of data by a natural language processor based entity tagger 105. The entity tagger 105 recognizes an entity to be a possible entity without determining the parent entity. The entity tagger 105 tags a named entity and an unnamed entity in a data source. The tagged entities are passed through multiple entity recognition models 107 which use different techniques to determine the parent entity of the tagged entity based on a voting procedure.
  • Based on the requirement, one or more ER models 107 are used. The one or more ER models either use a same technique or a different technique, but learn different types of names. For instance, a first model learns medicine names, a second model learns location names and the like. The detected entities are then passed through a voting based parent entity detector 201 to check if the parent entity is detected or not. The entities whose parent entity is detected 202 is stored in an entity storage 203. The entities whose parent entity is still unknown undergoes a process of entity resolution. The entity resolution is executed by a Manual/Domain specific NLP based Parent Entity Detectors 204. The entity resolution uses 1either a manual or an automatic parent entity detector that searches for assertions of parent entities in domain specific document collection and structured data. The Manual/Domain specific NLP based Parent Entity Detectors 204 finds out new parent entities and also identifies entities with respect to the new parent entities. The entities whose parent entities are still not determined are sent to the collection of NER models through a training sample 205. The model 107 keeps receiving new models built by learning from new entities whose parent entities are resolved through the NLP based parent entity detectors, new training samples and documents that are tagged with new/domain specific entities. The entities with unknown parent entity keep going through the parent entity detection processes until the parent entity is detected (205).
  • FIG. 3 is a block diagram illustrating a training based entity extraction process, according to an embodiment of the present disclosure. The training based models use a set of training samples 206 which are already trained to detect some parent entities of from a big data 301. The big data 301 represents an input to the system in the form of data including structured, semi-structured and unstructured data from heterogeneous sources. The entities are tagged in the data and the data containing the tagged entities are passed through multiple trained entity recognition models. The one or more parent entities associated with the tagged entities are then determined. The parent entities are then passed through a voting based parent entity detector to identify the appropriate parent entity based on a voting procedure. The entities whose parent entity is detected 303 is stored in the entity storage 204.
  • To resolve the entities whose parent entity is not detected, additional training samples and documents that are tagged with new domain specific entities are generated and the training, samples 205 is populated with new kind of parent entities suggested by the NLP based entity detectors to build new entity recognition models 302.
  • The training based entity recognition is also referred to as automatic learning, because the entity recognition is not explicitly included in the training set as long as the entities are of the designated type.
  • FIG. 4 is a block diagram illustrating a process for resolving entities, according to an embodiment of the present disclosure. The entity extractor 104 extracts entities from big data and determines the parent entity associated with each entity. The entity extractor 104 then passes the unresolved entities to an entity resolution 103 for entity disambiguation.
  • The entity resolver 103 comprises a plurality of resolution modules 401 such as entity resolution module 1, entity resolution module 2 . . . to entity resolution module n for resolving the extracted entities. The entity resolver 103 understands the context in which the entity is being used to determine the parent entity. The entity resolution 103 uses any one or a combination of a word sense disambiguation technique, a contextual resolution technique, a syntactic similarity and a semantic similarity for resolving the entities.
  • The entity extraction process is a combination of automatic learning and training based learning. An initial set of named entities and concepts are identified based at certain rudimentary NLP based rules and a parent entity of identified entities and concepts is discovered. Parent entity learning is also facilitated by using tagged data for training. As more than one method is used for learning, a voting based entity resolution is performed which establishes entity recognition by a maximum scare. A voting based entity resolver 402 conducts a voting procedure on the output of various entity resolvers 103 and provides resolved entities for further processing.
  • FIG. 5 is a flow chart illustrating a method for building entity hierarchy, according to an embodiment of the present disclosure. The method comprises extraction of entities, resolution of entities and then building a hierarchy of entities. At first, a big data is taken as input which is processed for extracting a plurality of entities. The big data comprises structured, semi-structured and unstructured data from plurality of heterogeneous us data sources. The entities extracted are any one of a named entity and unnamed entity (501). Then, a parent entity or a super entity of each extracted entities is determined by understanding the context in which the extracted entity is used (502). After the determination of the parent entity, the entities are resolved by bringing the synonymous entities together and holding the polysemous entities apart based on a semantic context and a syntactic context (503). Finally, a hierarchical structure of entities is built by using knowledge repositories, ontologies, and language repositories along with Natural Language Processing (NLP) techniques (504).
  • FIG. 6 is a flow chart illustrating a method for extracting entities from structured data sources, according to an embodiment of the present disclosure. The method comprises identifying each data point as an entity at 601. Here each data point, is any one of a table entity, at value entity, an attribute entity and a database entity. After identifying the data points as entities, the entities are identified based on a relationship defined with other entities at 602.
  • FIG. 7 is a flow chart illustrating a method for extracting entities from unstructured data through a self-learning entity extraction process, according to an embodiment of the present disclosure. The entities are tagged from big data without explicit recognizing the parent entity. Further entity recognition is performed by tagging entities without explicitly knowing the parent entity using, a natural language processing technique at 701. The entity recognition is performed using at least one of a maximum entropy model (maxent), a conditional random field (CRF) and a classification and clustering techniques. The tagged entities are passed through trained entity recognition models at 702 to learn the parent entity associated with the tagged entity at 703. Further it is checked if the parent entity of the tagged entity is detected or not at 704. If the parent entity is detected, the entities are stored in an entity storage along with the parent entity information and other context specific information at 705. If the parent entity is not detected, the entities are passed through NLP based entity detector 706. The NLP based entity detector parses documents that contain domain specific knowledge and learn from the explicit statements that are present. For instance, if the NLP based entity detector comes across sentences like, . . . “medicines such as penicillin”, then, the term penicillin is learnt as a medicine name and there on penicillin is tagged as medicine rather than just a named entity. Further, one or more new entity recognition models are built which include information of the new entities and new parent entities at 707. The entities are then passed through multiple new entity recognition models until the parent entity is obtained 708. Further, the entity recognition models are populated with the new entity recognition models at 709. The new entity recognition models are built by learning from samples containing new entities whose parent entities are identified in domain specific documents through the NLP based entity detectors are added to existing entity recognition models.
  • FIG. 8 is a flow chart illustrating a method for extracting entities from unstructured data sources through a training based entity extraction process, according to an embodiment of the present disclosure. The data containing tagged entities are passed through multiple trained entity recognition models at 801. Then, one or more parent entities associated with a tagged entity is determined at 802 and appropriate parent entity for the tagged entity is recognized based on a voting procedure at 803. Further, additional training samples and documents that are tagged with new domain specific entities are provided at 804. Finally, the training sample is populated with new kind of parent entities suggested by the NLP based entity detectors in order to build new recognition models at 805.
  • The embodiments herein extracts entities based on certain NLP rules. The entity extractor continues to learn from the available data through different learning algorithms. The inclusion of concepts among entities supports a wider scope for querying the data and the inclusion of the ability to recognize concepts and resolving them gives a much higher expressiveness to model semantics. The entity hierarchy helps in bringing in entities related to the queries mentioned in the query. Building an entity hierarchy, functions as query enrichment (query enrichment with semantic resolution) that allows any query to encompass all the entities of interest and eliminate the ones that are not pertinent.
  • The present disclosure finds relevant entities and relationships, even though the entity names are not mentioned explicitly in the big data. The entity hierarchy is useful when but not limited to, a user has to search/query about entities and their relationships/interactions with other named entities/concepts. The entity hierarchy encompasses all the named and unnamed entities that exist in the big data. The embodiments of the present disclosure provide immense benefit in Retail, Health and Pharmaceutical services, Banking and Insurance etc.
  • Implementation of the techniques, blocks, steps and means described above can be done in various ways. For example, these techniques, blocks, steps and means can be implemented in hardware, software, or a combination thereof. For a hardware implementation, the processing units ma be implemented within one or more application specific integrated circuits), digital signal processing devices, programmable logic devices, field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described above and/or a combination thereof.
  • Also, it is noted that the embodiments can be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although the flowcharts describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations can be rearranged. A process is terminated when its operations are completed, but could have additional steps not included in the figure. A process can correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination corresponds to a return of the function to the calling function or the main function.
  • Furthermore, embodiments can be implemented by hardware, software, scripting languages, firmware, middleware, microcode, hardware description languages and/or any combination thereof. When implemented in software, firmware, middleware, scripting language and/or microcode, the program code or code segments to perform the necessary tasks can he stored in a machine readable medium, such as a storage medium. A code segment or machine-executable instruction may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a script, a class, or any combination of instructions, data structures and/or program statements. A code segment can he coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters and/or memory contents. Information, arguments, parameters, data, etc. can be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
  • For a firmware and/or software implementation, the methodologies can be implemented with modules e.g., procedures, functions, and so on) that perform the functions described herein. Any machine-readable medium tangibly embodying instructions can be used in implementing the methodologies described herein.
  • The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification.

Claims (20)

What is claimed is:
1. A method of building an entity hierarchy comprises:
extracting a plurality of entities from a big data;
determining a parent entity by understanding a context in which the entity is used;
resolving the entities by bringing the synonymous entities together and holding the polysemous entities apart based on a semantic context and a syntactic context; and
building a hierarchical structure of entities using knowledge repositories, ontologies, and language repositories along with natural language processing techniques.
2. The method of claim 1, wherein the big data comprises structured, semi-structured and unstructured data.
3. The method of claim 1, wherein each entity is associated with a parent emit.
4. The method of claim 1, wherein the entity is at least one of named entities and unnamed entities:
where the named entities belong to one of the parent entities and includes names of person, organization, locations, time expressions, quantities, money values quantities, monetary values and the like; and
the unnamed entities includes nouns, verbs, combinations of nouns and verbs, a concept or a language unit with independent meaning.
5. The method of claim 1, wherein extracting the plurality of entities from the structured data comprises:
identifying each data point as an entity; and
identifying entities based on a relationship defined with other entities;
wherein the data point comprises at least one of a entity, a value entity, an attribute entity, and a database entity.
6. The method of claim 5, wherein extracting the plurality of entities from unstructured data comprises:
recognizing the named entities and the unnamed entities from data sources using a natural language processor based entity tagger;
passing the named entities and unnamed entities through multiple entity recognition models;
determining the parent entity; and
storing the entities along with respective parent entity and context specific information in an entity store.
7. The method of claim 6, wherein the entity extraction from unstructured data is a combination of a self-learning process and a training based learning process.
8. The method of claim 7, wherein the self-learning entity extraction process comprises:
performing entity recognition by tagging entities without explicitly knowing the parent entity using a natural language processing technique;
passing the tagged entities through trained Entity Recognition (ER) models to learn the parent entity associated with the tagged entity;
detecting the parent entity using a voting procedure; and
storing the entities whose parent entities are detected in the entity store.
9. The method of claim 7, further comprises:
feeding the data containing the entities whose parent entities are not detected to a Natural Language Processor (NLP) based entity detector which involves parsing documents containing domain specific knowledge and learn the parent entities from the explicit or implicit facts stated in the documents;
building new entity recognition models;
passing, the entities through multiple entity recognition models until the parent entity is obtained; and
populating the entity recognition models with new entity recognition models built by learning from samples containing new entities whose parent entities are identified in domain specific documents through the NLP based entity detectors.
10. The method of claim 7, wherein the training based entity extraction process comprises:
passing the data containing the tagged entities through multiple trained entity recognition models;
determining one or more parent entities associated with the entities; and
recognizing the appropriate parent entity based on a voting procedure.
11. The method of claim 10, further comprises:
providing additional training samples and documents that are tagged with new domain specific entities; and
populating the training sample with new kind of parent entities suggested by the NLP based entity detectors to build new entity recognition models.
12. The method of claim 1, wherein the entity recognition models to detect the parent entity use at least one of:
maximum entropy model (maxent);
conditional random fields (CRF);
classification and clustering techniques; and
NLP based techniques.
13. The method of claim 1, wherein resolving the plurality of entities comprises at least one of a:
word sense disambiguation technique;
contextual resolution technique;
syntactic similarity; and
semantic similarity.
14. The method of claim 1, wherein the entity extraction from semi-structured data is a combination of extracting entities from structured data and unstructured data.
15. A system for building an entity hierarchy comprises:
an entity extractor to extract a plurality of entities from a big data;
a Language and Domain model to conceptualize the entities in accordance with a structured context or semi-structured context;
an entity resolver to resolve the entities by gathering the synonymous entities together and polysemous entities apart based on syntactic and semantic contexts; and
an entity hierarchy builder to build a hierarchical structure of entities using natural language processing techniques in conjunction with the Language and Domain models.
16. The system of claim 15, wherein the entity extractor comprises:
an entity tagger to tag named entities and unnamed entities in a data source; and
a parent entity detector to determine assertions of parent entity in data sources, where the entities are passed through multiple entity recognition models to determine the parent entity based on a voting procedure.
17. The system of claim 16, wherein the entity recognition models to detect the parent entity use at least one of:
maximum entropy model (maxent);
conditional random fields (CRF);
classification and clustering techniques; and
NLP based techniques.
18. The system of claim 15, wherein entity tagger is adapted to:
tag the named entities and the unnamed entities; and
tag the named entities with explicit mention of the parent entity.
19. The system of claim 15, wherein the entity resolver understands the context in which the entities is being used and determine the parent entity.
20. The system of claim 15, wherein the entity resolver performs a contextual resolution using the Language and Domain models which comprise at least one of a:
language repositories;
domain ontologies; and
knowledge repositories in combination with Natural Language Processing (NLP) techniques.
US13/755,069 2012-08-10 2013-01-31 Method and system for building entity hierarchy from big data Abandoned US20140046653A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
IN3286/CHE/2012 2012-08-10
IN3286CH2012 2012-08-10

Publications (1)

Publication Number Publication Date
US20140046653A1 true US20140046653A1 (en) 2014-02-13

Family

ID=50066829

Family Applications (4)

Application Number Title Priority Date Filing Date
US13/755,069 Abandoned US20140046653A1 (en) 2012-08-10 2013-01-31 Method and system for building entity hierarchy from big data
US13/755,047 Abandoned US20140046977A1 (en) 2012-08-10 2013-01-31 System and method for mining patterns from relationship sequences extracted from big data
US13/755,059 Abandoned US20140046892A1 (en) 2012-08-10 2013-01-31 Method and system for visualizing information extracted from big data
US13/755,062 Active 2033-12-20 US9239830B2 (en) 2012-08-10 2013-01-31 System and method for building relationship hierarchy

Family Applications After (3)

Application Number Title Priority Date Filing Date
US13/755,047 Abandoned US20140046977A1 (en) 2012-08-10 2013-01-31 System and method for mining patterns from relationship sequences extracted from big data
US13/755,059 Abandoned US20140046892A1 (en) 2012-08-10 2013-01-31 Method and system for visualizing information extracted from big data
US13/755,062 Active 2033-12-20 US9239830B2 (en) 2012-08-10 2013-01-31 System and method for building relationship hierarchy

Country Status (1)

Country Link
US (4) US20140046653A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150081729A1 (en) * 2013-09-19 2015-03-19 GM Global Technology Operations LLC Methods and systems for combining vehicle data
US20150242407A1 (en) * 2014-02-22 2015-08-27 SourceThought, Inc. Discovery of Data Relationships Between Disparate Data Sets
US20160125067A1 (en) * 2014-10-31 2016-05-05 International Business Machines Corporation Entity resolution between datasets
US9727642B2 (en) 2014-11-21 2017-08-08 International Business Machines Corporation Question pruning for evaluating a hypothetical ontological link
US9892362B2 (en) 2014-11-18 2018-02-13 International Business Machines Corporation Intelligence gathering and analysis using a question answering system
US10157177B2 (en) * 2016-10-28 2018-12-18 Kira Inc. System and method for extracting entities in electronic documents
US10318870B2 (en) 2014-11-19 2019-06-11 International Business Machines Corporation Grading sources and managing evidence for intelligence analysis
US10331659B2 (en) 2016-09-06 2019-06-25 International Business Machines Corporation Automatic detection and cleansing of erroneous concepts in an aggregated knowledge base

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9565275B2 (en) 2012-02-09 2017-02-07 Rockwell Automation Technologies, Inc. Transformation of industrial data into useful cloud information
US9477936B2 (en) 2012-02-09 2016-10-25 Rockwell Automation Technologies, Inc. Cloud-based operator interface for industrial automation
US20140046653A1 (en) * 2012-08-10 2014-02-13 Xurmo Technologies Pvt. Ltd. Method and system for building entity hierarchy from big data
US20140122414A1 (en) * 2012-10-29 2014-05-01 Xurmo Technologies Private Limited Method and system for providing a personalization solution based on a multi-dimensional data
CA2899314C (en) * 2013-02-14 2018-11-27 24/7 Customer, Inc. Categorization of user interactions into predefined hierarchical categories
US9438648B2 (en) 2013-05-09 2016-09-06 Rockwell Automation Technologies, Inc. Industrial data analytics in a cloud platform
US9709978B2 (en) 2013-05-09 2017-07-18 Rockwell Automation Technologies, Inc. Using cloud-based data for virtualization of an industrial automation environment with information overlays
US9786197B2 (en) 2013-05-09 2017-10-10 Rockwell Automation Technologies, Inc. Using cloud-based data to facilitate enhancing performance in connection with an industrial automation system
US9703902B2 (en) 2013-05-09 2017-07-11 Rockwell Automation Technologies, Inc. Using cloud-based data for industrial simulation
US10026049B2 (en) * 2013-05-09 2018-07-17 Rockwell Automation Technologies, Inc. Risk assessment for industrial systems using big data
US9989958B2 (en) 2013-05-09 2018-06-05 Rockwell Automation Technologies, Inc. Using cloud-based data for virtualization of an industrial automation environment
US9146980B1 (en) * 2013-06-24 2015-09-29 Google Inc. Temporal content selection
US9564122B2 (en) * 2014-03-25 2017-02-07 Nice Ltd. Language model adaptation based on filtered data
US10042837B2 (en) 2014-12-02 2018-08-07 International Business Machines Corporation NLP processing of real-world forms via element-level template correlation
WO2016114433A1 (en) * 2015-01-16 2016-07-21 주식회사 솔트룩스 Unstructured data processing system and method
WO2016200373A1 (en) * 2015-06-09 2016-12-15 Hewlett-Packard Development Company, L.P. Generating further groups of events based on similarity values and behavior matching using a representation of behavior
CN104933164B (en) * 2015-06-26 2018-10-09 华南理工大学 Massive Internet data relationships between entities named in the extraction method and system
US20170154385A1 (en) * 2015-11-29 2017-06-01 Vatbox, Ltd. System and method for automatic validation
CN105701203A (en) * 2016-01-12 2016-06-22 北京中交兴路车联网科技有限公司 Information storage and query method and system for big data clusters
US10204146B2 (en) 2016-02-09 2019-02-12 Ca, Inc. Automatic natural language processing based data extraction
US10042846B2 (en) * 2016-04-28 2018-08-07 International Business Machines Corporation Cross-lingual information extraction program
US10228916B2 (en) * 2016-06-23 2019-03-12 International Business Machines Corporation Predictive optimization of next task through asset reuse

Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060095653A1 (en) * 2004-11-03 2006-05-04 Fleming James S Network of networks of associative memory networks for knowledge management
US7162465B2 (en) * 2001-12-21 2007-01-09 Tor-Kristian Jenssen System for analyzing occurrences of logical concepts in text documents
US20080250020A1 (en) * 2001-01-20 2008-10-09 Pointcross, Inc Ontological representation of knowledge
US20090144319A1 (en) * 2007-11-29 2009-06-04 Rajendra Bhagwatisingh Panwar External system integration into automated attribute discovery
US20090144609A1 (en) * 2007-10-17 2009-06-04 Jisheng Liang NLP-based entity recognition and disambiguation
US20090198642A1 (en) * 2008-01-31 2009-08-06 International Business Machines Corporation Method and system for generating an ontology
US7657540B1 (en) * 2003-02-04 2010-02-02 Seisint, Inc. Method and system for linking and delinking data records
US20100070442A1 (en) * 2008-09-15 2010-03-18 Siemens Aktiengesellschaft Organizing knowledge data and experience data
US20100223276A1 (en) * 2007-03-27 2010-09-02 Faleh Jassem Al-Shameri Automated Generation of Metadata for Mining Image and Text Data
US20110078159A1 (en) * 2009-09-30 2011-03-31 Microsoft Corporation Long-Query Retrieval
US20110307440A1 (en) * 2009-03-02 2011-12-15 Olga Perevozchikova Method for the fully modifiable framework distribution of data in a data warehouse taking account of the preliminary etymological separation of said data
US8275608B2 (en) * 2008-07-03 2012-09-25 Xerox Corporation Clique based clustering for named entity recognition system
US20130013603A1 (en) * 2011-05-24 2013-01-10 Namesforlife, Llc Semiotic indexing of digital resources
US20130103764A1 (en) * 2010-06-24 2013-04-25 Arbitron Mobile Oy Network server arrangement for processing non-parametric, multi-dimensional, spatial and temporal human behavior or technical observations measured pervasively, and related method for the same
US20130124193A1 (en) * 2011-11-15 2013-05-16 Business Objects Software Limited System and Method Implementing a Text Analysis Service
US20130238531A1 (en) * 2012-03-09 2013-09-12 Sap Ag Automatic Combination and Mapping of Text-Mining Services
US20130311467A1 (en) * 2012-05-18 2013-11-21 Xerox Corporation System and method for resolving entity coreference
US20130317803A1 (en) * 2012-05-24 2013-11-28 The Keyw Corporation Enterprise-scalable model-based analytics
US20140025626A1 (en) * 2012-04-19 2014-01-23 Avalon Consulting, LLC Method of using search engine facet indexes to enable search-enhanced business intelligence analysis
US20140032574A1 (en) * 2012-07-23 2014-01-30 Emdadur R. Khan Natural language understanding using brain-like approach: semantic engine using brain-like approach (sebla) derives semantics of words and sentences
US20140046977A1 (en) * 2012-08-10 2014-02-13 Xurmo Technologies Pvt. Ltd. System and method for mining patterns from relationship sequences extracted from big data
US20140089472A1 (en) * 2011-06-03 2014-03-27 David Tessler System and method for semantic knowledge capture
US8725666B2 (en) * 2010-02-26 2014-05-13 Lawrence Livermore National Security, Llc. Information extraction system
US20150026159A1 (en) * 2012-03-05 2015-01-22 Evresearch Ltd Digital Resource Set Integration Methods, Interfaces and Outputs
US8943004B2 (en) * 2012-02-08 2015-01-27 Adam Treiser Tools and methods for determining relationship values
US8954440B1 (en) * 2010-04-09 2015-02-10 Wal-Mart Stores, Inc. Selectively delivering an article

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6665669B2 (en) * 2000-01-03 2003-12-16 Db Miner Technology Inc. Methods and system for mining frequent patterns
WO2004042493A2 (en) * 2002-10-24 2004-05-21 Agency For Science, Technology And Research Method and system for discovering knowledge from text documents
WO2004090285A1 (en) * 2003-03-31 2004-10-21 Baker Hughes Incorporated Real-time drilling optimization based on mwd dynamic measurements
US20060074980A1 (en) * 2004-09-29 2006-04-06 Sarkar Pte. Ltd. System for semantically disambiguating text information
US7849048B2 (en) * 2005-07-05 2010-12-07 Clarabridge, Inc. System and method of making unstructured data available to structured data analysis tools
US7882126B2 (en) * 2008-02-07 2011-02-01 International Business Machines Corporation Systems and methods for computation of optimal distance bounds on compressed time-series data
CA2646117A1 (en) * 2008-12-02 2010-06-02 Oculus Info Inc. System and method for visualizing connected temporal and spatial information as an integrated visual representation on a user interface
US8719308B2 (en) * 2009-02-16 2014-05-06 Business Objects, S.A. Method and system to process unstructured data
US8229883B2 (en) * 2009-03-30 2012-07-24 Sap Ag Graph based re-composition of document fragments for name entity recognition under exploitation of enterprise databases
WO2013162607A1 (en) * 2012-04-27 2013-10-31 Empire Technology Development Llc Multiple variable coverage memory for database indexing

Patent Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080250020A1 (en) * 2001-01-20 2008-10-09 Pointcross, Inc Ontological representation of knowledge
US7162465B2 (en) * 2001-12-21 2007-01-09 Tor-Kristian Jenssen System for analyzing occurrences of logical concepts in text documents
US7657540B1 (en) * 2003-02-04 2010-02-02 Seisint, Inc. Method and system for linking and delinking data records
US20060095653A1 (en) * 2004-11-03 2006-05-04 Fleming James S Network of networks of associative memory networks for knowledge management
US20100223276A1 (en) * 2007-03-27 2010-09-02 Faleh Jassem Al-Shameri Automated Generation of Metadata for Mining Image and Text Data
US20090144609A1 (en) * 2007-10-17 2009-06-04 Jisheng Liang NLP-based entity recognition and disambiguation
US20090144319A1 (en) * 2007-11-29 2009-06-04 Rajendra Bhagwatisingh Panwar External system integration into automated attribute discovery
US20090198642A1 (en) * 2008-01-31 2009-08-06 International Business Machines Corporation Method and system for generating an ontology
US8275608B2 (en) * 2008-07-03 2012-09-25 Xerox Corporation Clique based clustering for named entity recognition system
US20100070442A1 (en) * 2008-09-15 2010-03-18 Siemens Aktiengesellschaft Organizing knowledge data and experience data
US20110307440A1 (en) * 2009-03-02 2011-12-15 Olga Perevozchikova Method for the fully modifiable framework distribution of data in a data warehouse taking account of the preliminary etymological separation of said data
US20110078159A1 (en) * 2009-09-30 2011-03-31 Microsoft Corporation Long-Query Retrieval
US8725666B2 (en) * 2010-02-26 2014-05-13 Lawrence Livermore National Security, Llc. Information extraction system
US8954440B1 (en) * 2010-04-09 2015-02-10 Wal-Mart Stores, Inc. Selectively delivering an article
US20130103764A1 (en) * 2010-06-24 2013-04-25 Arbitron Mobile Oy Network server arrangement for processing non-parametric, multi-dimensional, spatial and temporal human behavior or technical observations measured pervasively, and related method for the same
US20130013603A1 (en) * 2011-05-24 2013-01-10 Namesforlife, Llc Semiotic indexing of digital resources
US20140089472A1 (en) * 2011-06-03 2014-03-27 David Tessler System and method for semantic knowledge capture
US20130124193A1 (en) * 2011-11-15 2013-05-16 Business Objects Software Limited System and Method Implementing a Text Analysis Service
US8943004B2 (en) * 2012-02-08 2015-01-27 Adam Treiser Tools and methods for determining relationship values
US20150026159A1 (en) * 2012-03-05 2015-01-22 Evresearch Ltd Digital Resource Set Integration Methods, Interfaces and Outputs
US20130238531A1 (en) * 2012-03-09 2013-09-12 Sap Ag Automatic Combination and Mapping of Text-Mining Services
US20140025626A1 (en) * 2012-04-19 2014-01-23 Avalon Consulting, LLC Method of using search engine facet indexes to enable search-enhanced business intelligence analysis
US20130311467A1 (en) * 2012-05-18 2013-11-21 Xerox Corporation System and method for resolving entity coreference
US20130317803A1 (en) * 2012-05-24 2013-11-28 The Keyw Corporation Enterprise-scalable model-based analytics
US20140032574A1 (en) * 2012-07-23 2014-01-30 Emdadur R. Khan Natural language understanding using brain-like approach: semantic engine using brain-like approach (sebla) derives semantics of words and sentences
US20140046977A1 (en) * 2012-08-10 2014-02-13 Xurmo Technologies Pvt. Ltd. System and method for mining patterns from relationship sequences extracted from big data

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
Dalvi, Bhavana Bharat, William W. Cohen, and Jamie Callan. "Websets: Extracting sets of entities from the web using unsupervised information extraction." Proceedings of the fifth ACM international conference on Web search and data mining. ACM, 2012. *
Etzioni, Oren, et al. "Unsupervised named-entity extraction from the web: An experimental study." Artificial intelligence 165.1 (2005): 91-134. *
Hashem, Ibrahim Abaker Targio, et al. "The rise of "big data" on cloud computing: Review and open research issues." Information Systems 47 (2015): 98-115. *
Ottens, Kévin, Marie-Pierre Gleizes, and Pierre Glize. "A multi-agent system for building dynamic ontologies." Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems. ACM, 2007. *
Sekine, Satoshi, and Chikashi Nobata. "Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy." LREC. 2004. *
Wang, Ting, et al. Automatic extraction of hierarchical relations from text. Springer Berlin Heidelberg, 2006. *
Wick, Michael, Sameer Singh, and Andrew McCallum. "A discriminative hierarchical model for fast coreference at large scale." Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 2012. *
Zhang, Min, et al. "Discovering relations between named entities from a large raw corpus using tree similarity-based clustering." Natural Language Processing-IJCNLP 2005. Springer Berlin Heidelberg, 2005. 378-389. *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150081729A1 (en) * 2013-09-19 2015-03-19 GM Global Technology Operations LLC Methods and systems for combining vehicle data
US20150242407A1 (en) * 2014-02-22 2015-08-27 SourceThought, Inc. Discovery of Data Relationships Between Disparate Data Sets
US20160125067A1 (en) * 2014-10-31 2016-05-05 International Business Machines Corporation Entity resolution between datasets
US9996607B2 (en) * 2014-10-31 2018-06-12 International Business Machines Corporation Entity resolution between datasets
US9892362B2 (en) 2014-11-18 2018-02-13 International Business Machines Corporation Intelligence gathering and analysis using a question answering system
US10318870B2 (en) 2014-11-19 2019-06-11 International Business Machines Corporation Grading sources and managing evidence for intelligence analysis
US9727642B2 (en) 2014-11-21 2017-08-08 International Business Machines Corporation Question pruning for evaluating a hypothetical ontological link
US10331659B2 (en) 2016-09-06 2019-06-25 International Business Machines Corporation Automatic detection and cleansing of erroneous concepts in an aggregated knowledge base
US10157177B2 (en) * 2016-10-28 2018-12-18 Kira Inc. System and method for extracting entities in electronic documents

Also Published As

Publication number Publication date
US20140046877A1 (en) 2014-02-13
US9239830B2 (en) 2016-01-19
US20140046892A1 (en) 2014-02-13
US20140046977A1 (en) 2014-02-13

Similar Documents

Publication Publication Date Title
Lopez et al. Poweraqua: Fishing the semantic web
Angeli et al. Leveraging linguistic structure for open domain information extraction
US10282389B2 (en) NLP-based entity recognition and disambiguation
US9965971B2 (en) System and method for domain adaptation in question answering
Strötgen et al. Multilingual and cross-domain temporal tagging
US8301438B2 (en) Method for processing natural language questions and apparatus thereof
Sigletos et al. Combining information extraction systems using voting and stacked generalization
US8027948B2 (en) Method and system for generating an ontology
WO2003073374A2 (en) A data integration and knowledge management solution
US9183511B2 (en) System and method for universal translating from natural language questions to structured queries
US20150269139A1 (en) Automatic Evaluation and Improvement of Ontologies for Natural Language Processing Tasks
US8122045B2 (en) Method for mapping a data source to a data target
US9239830B2 (en) System and method for building relationship hierarchy
Benz et al. Semantics made by you and me: Self-emerging ontologies can capture the diversity of shared knowledge
US8036876B2 (en) Methods of defining ontologies, word disambiguation methods, computer systems, and articles of manufacture
US20070067320A1 (en) Detecting relationships in unstructured text
Ren et al. Cotype: Joint extraction of typed entities and relations with knowledge bases
US20130262361A1 (en) System and method for natural language querying
Barnickel et al. Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts
Hogenboom et al. Semantics-based information extraction for detecting economic events
Welbl et al. Constructing datasets for multi-hop reading comprehension across documents
Naderi et al. OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents
Shaalan et al. A hybrid approach to Arabic named entity recognition
Ferrandez et al. The QALL-ME framework: A specifiable-domain multilingual question answering architecture
Unger et al. An introduction to question answering over linked data

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION