WO2008042264A3 - Distributed method for integrating data mining and text categorization techniques - Google Patents
Distributed method for integrating data mining and text categorization techniques Download PDFInfo
- Publication number
- WO2008042264A3 WO2008042264A3 PCT/US2007/020938 US2007020938W WO2008042264A3 WO 2008042264 A3 WO2008042264 A3 WO 2008042264A3 US 2007020938 W US2007020938 W US 2007020938W WO 2008042264 A3 WO2008042264 A3 WO 2008042264A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- documents
- data mining
- rule
- text categorization
- integrating data
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
Abstract
A method for prediction analysis using text categorization is provided. The method includes the steps of: grouping a plurality of text documents into a plurality of classes; selecting a top m most discriminatory terms for each class of documents using statistical based measures; determining for each document the presence or absence of each of the discriminatory terms, learning rule-based models of each class of documents using a rule learning algorithm; determining, for at least a portion of the plurality of documents, if a given learned rule has been satisfied by each respective document; creating a database of the rules associated with documents satisfying the rules; and performing distributed data mining to form a predictive result based on at least a portion of the plurality of documents.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US84809206P | 2006-09-29 | 2006-09-29 | |
US60/848,092 | 2006-09-29 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2008042264A2 WO2008042264A2 (en) | 2008-04-10 |
WO2008042264A3 true WO2008042264A3 (en) | 2008-07-24 |
Family
ID=39268995
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2007/020938 WO2008042264A2 (en) | 2006-09-29 | 2007-09-28 | Distributed method for integrating data mining and text categorization techniques |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2008042264A2 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IN2015CH03928A (en) | 2015-07-30 | 2015-08-14 | Wipro Ltd | |
CN112766506A (en) * | 2021-01-19 | 2021-05-07 | 澜途集思生态科技集团有限公司 | Knowledge base construction method based on architecture |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030041042A1 (en) * | 2001-08-22 | 2003-02-27 | Insyst Ltd | Method and apparatus for knowledge-driven data mining used for predictions |
US20050154692A1 (en) * | 2004-01-14 | 2005-07-14 | Jacobsen Matthew S. | Predictive selection of content transformation in predictive modeling systems |
US20060101048A1 (en) * | 2004-11-08 | 2006-05-11 | Mazzagatti Jane C | KStore data analyzer |
US20060190310A1 (en) * | 2005-02-24 | 2006-08-24 | Yasu Technologies Pvt. Ltd. | System and method for designing effective business policies via business rules analysis |
-
2007
- 2007-09-28 WO PCT/US2007/020938 patent/WO2008042264A2/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030041042A1 (en) * | 2001-08-22 | 2003-02-27 | Insyst Ltd | Method and apparatus for knowledge-driven data mining used for predictions |
US20050154692A1 (en) * | 2004-01-14 | 2005-07-14 | Jacobsen Matthew S. | Predictive selection of content transformation in predictive modeling systems |
US20060101048A1 (en) * | 2004-11-08 | 2006-05-11 | Mazzagatti Jane C | KStore data analyzer |
US20060190310A1 (en) * | 2005-02-24 | 2006-08-24 | Yasu Technologies Pvt. Ltd. | System and method for designing effective business policies via business rules analysis |
Also Published As
Publication number | Publication date |
---|---|
WO2008042264A2 (en) | 2008-04-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10599700B2 (en) | Systems and methods for narrative detection and frame detection using generalized concepts and relations | |
CN106201465B (en) | Software project personalized recommendation method for open source community | |
CN110287494A (en) | A method of the short text Similarity matching based on deep learning BERT algorithm | |
CN108304468A (en) | A kind of file classification method and document sorting apparatus | |
CN110188192B (en) | Multi-task network construction and multi-scale criminal name law enforcement combined prediction method | |
CN109472462B (en) | Project risk rating method and device based on multi-model stack fusion | |
WO2007140386A3 (en) | Learning syntactic patterns for automatic discovery of causal relations from text | |
WO2005050473A3 (en) | Clustering of text for structuring of text documents and training of language models | |
CN103336969A (en) | Image meaning parsing method based on soft glance learning | |
CN112966525B (en) | Law field event extraction method based on pre-training model and convolutional neural network algorithm | |
CN112001166A (en) | Intelligent question-answer sentence-to-semantic matching method and device for government affair consultation service | |
Tyagi et al. | Demystifying the role of natural language processing (NLP) in smart city applications: background, motivation, recent advances, and future research directions | |
CN114239574A (en) | Miner violation knowledge extraction method based on entity and relationship joint learning | |
Liu et al. | Semantic neural network ensemble for automated dependency relation extraction from bridge inspection reports | |
CN104699614A (en) | Software defect component predicting method | |
CN114528398A (en) | Emotion prediction method and system based on interactive double-graph convolutional network | |
Cho et al. | Recognizing architectural objects in floor-plan drawings using deep-learning style-transfer algorithms | |
CN105868917A (en) | Complex product availability characteristic identification method designed for maintainability | |
Shcherbakova et al. | Societies of strangers do not speak less complex languages | |
Atapattu et al. | Acquisition of triples of knowledge from lecture notes: A natural langauge processing approach | |
WO2008042264A3 (en) | Distributed method for integrating data mining and text categorization techniques | |
CN112818698A (en) | Fine-grained user comment sentiment analysis method based on dual-channel model | |
Kinra et al. | Textual data in transportation research: Techniques and opportunities | |
Kim et al. | Approach to the extraction of design features of interior design elements using image recognition technique | |
CN110489514A (en) | Promote system and method, the event extraction method and system of event extraction annotating efficiency |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07838993 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 07838993 Country of ref document: EP Kind code of ref document: A2 |