WO2008042264A3 - Distributed method for integrating data mining and text categorization techniques - Google Patents

Distributed method for integrating data mining and text categorization techniques Download PDF

Info

Publication number
WO2008042264A3
WO2008042264A3 PCT/US2007/020938 US2007020938W WO2008042264A3 WO 2008042264 A3 WO2008042264 A3 WO 2008042264A3 US 2007020938 W US2007020938 W US 2007020938W WO 2008042264 A3 WO2008042264 A3 WO 2008042264A3
Authority
WO
WIPO (PCT)
Prior art keywords
documents
data mining
rule
text categorization
integrating data
Prior art date
Application number
PCT/US2007/020938
Other languages
French (fr)
Other versions
WO2008042264A2 (en
Inventor
Ali Hadjarian
Original Assignee
Inferx Corp
Ali Hadjarian
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inferx Corp, Ali Hadjarian filed Critical Inferx Corp
Publication of WO2008042264A2 publication Critical patent/WO2008042264A2/en
Publication of WO2008042264A3 publication Critical patent/WO2008042264A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification

Abstract

A method for prediction analysis using text categorization is provided. The method includes the steps of: grouping a plurality of text documents into a plurality of classes; selecting a top m most discriminatory terms for each class of documents using statistical based measures; determining for each document the presence or absence of each of the discriminatory terms, learning rule-based models of each class of documents using a rule learning algorithm; determining, for at least a portion of the plurality of documents, if a given learned rule has been satisfied by each respective document; creating a database of the rules associated with documents satisfying the rules; and performing distributed data mining to form a predictive result based on at least a portion of the plurality of documents.
PCT/US2007/020938 2006-09-29 2007-09-28 Distributed method for integrating data mining and text categorization techniques WO2008042264A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US84809206P 2006-09-29 2006-09-29
US60/848,092 2006-09-29

Publications (2)

Publication Number Publication Date
WO2008042264A2 WO2008042264A2 (en) 2008-04-10
WO2008042264A3 true WO2008042264A3 (en) 2008-07-24

Family

ID=39268995

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/020938 WO2008042264A2 (en) 2006-09-29 2007-09-28 Distributed method for integrating data mining and text categorization techniques

Country Status (1)

Country Link
WO (1) WO2008042264A2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IN2015CH03928A (en) 2015-07-30 2015-08-14 Wipro Ltd
CN112766506A (en) * 2021-01-19 2021-05-07 澜途集思生态科技集团有限公司 Knowledge base construction method based on architecture

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030041042A1 (en) * 2001-08-22 2003-02-27 Insyst Ltd Method and apparatus for knowledge-driven data mining used for predictions
US20050154692A1 (en) * 2004-01-14 2005-07-14 Jacobsen Matthew S. Predictive selection of content transformation in predictive modeling systems
US20060101048A1 (en) * 2004-11-08 2006-05-11 Mazzagatti Jane C KStore data analyzer
US20060190310A1 (en) * 2005-02-24 2006-08-24 Yasu Technologies Pvt. Ltd. System and method for designing effective business policies via business rules analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030041042A1 (en) * 2001-08-22 2003-02-27 Insyst Ltd Method and apparatus for knowledge-driven data mining used for predictions
US20050154692A1 (en) * 2004-01-14 2005-07-14 Jacobsen Matthew S. Predictive selection of content transformation in predictive modeling systems
US20060101048A1 (en) * 2004-11-08 2006-05-11 Mazzagatti Jane C KStore data analyzer
US20060190310A1 (en) * 2005-02-24 2006-08-24 Yasu Technologies Pvt. Ltd. System and method for designing effective business policies via business rules analysis

Also Published As

Publication number Publication date
WO2008042264A2 (en) 2008-04-10

Similar Documents

Publication Publication Date Title
US10599700B2 (en) Systems and methods for narrative detection and frame detection using generalized concepts and relations
CN106201465B (en) Software project personalized recommendation method for open source community
CN110287494A (en) A method of the short text Similarity matching based on deep learning BERT algorithm
CN108304468A (en) A kind of file classification method and document sorting apparatus
CN110188192B (en) Multi-task network construction and multi-scale criminal name law enforcement combined prediction method
CN109472462B (en) Project risk rating method and device based on multi-model stack fusion
WO2007140386A3 (en) Learning syntactic patterns for automatic discovery of causal relations from text
WO2005050473A3 (en) Clustering of text for structuring of text documents and training of language models
CN103336969A (en) Image meaning parsing method based on soft glance learning
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN112001166A (en) Intelligent question-answer sentence-to-semantic matching method and device for government affair consultation service
Tyagi et al. Demystifying the role of natural language processing (NLP) in smart city applications: background, motivation, recent advances, and future research directions
CN114239574A (en) Miner violation knowledge extraction method based on entity and relationship joint learning
Liu et al. Semantic neural network ensemble for automated dependency relation extraction from bridge inspection reports
CN104699614A (en) Software defect component predicting method
CN114528398A (en) Emotion prediction method and system based on interactive double-graph convolutional network
Cho et al. Recognizing architectural objects in floor-plan drawings using deep-learning style-transfer algorithms
CN105868917A (en) Complex product availability characteristic identification method designed for maintainability
Shcherbakova et al. Societies of strangers do not speak less complex languages
Atapattu et al. Acquisition of triples of knowledge from lecture notes: A natural langauge processing approach
WO2008042264A3 (en) Distributed method for integrating data mining and text categorization techniques
CN112818698A (en) Fine-grained user comment sentiment analysis method based on dual-channel model
Kinra et al. Textual data in transportation research: Techniques and opportunities
Kim et al. Approach to the extraction of design features of interior design elements using image recognition technique
CN110489514A (en) Promote system and method, the event extraction method and system of event extraction annotating efficiency

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07838993

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07838993

Country of ref document: EP

Kind code of ref document: A2