CN1180377C - 一种对半结构化文档集进行文本挖掘的方法 - Google Patents
一种对半结构化文档集进行文本挖掘的方法 Download PDFInfo
- Publication number
- CN1180377C CN1180377C CNB021290458A CN02129045A CN1180377C CN 1180377 C CN1180377 C CN 1180377C CN B021290458 A CNB021290458 A CN B021290458A CN 02129045 A CN02129045 A CN 02129045A CN 1180377 C CN1180377 C CN 1180377C
- Authority
- CN
- China
- Prior art keywords
- node
- document
- semi
- mining
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000005065 mining Methods 0.000 title claims abstract description 27
- 238000000034 method Methods 0.000 title claims abstract description 24
- 239000013598 vector Substances 0.000 claims abstract description 27
- 238000009412 basement excavation Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 8
- 238000013178 mathematical model Methods 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 abstract description 4
- 230000010365 information processing Effects 0.000 abstract description 4
- 230000007547 defect Effects 0.000 abstract 1
- 238000002474 experimental method Methods 0.000 description 3
- 238000007418 data mining Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
基于向量空间模型TFIDF | 基于结构链接向量模型 | |||||
Mi | Nj | M(ni,j) | M(F(i,j)) | Nj | M(ni,j) | M(F(i,j)) |
63 | 60 | 37 | 0.602 | 59 | 52 | 0.852 |
76 | 69 | 53 | 0.731 | 71 | 62 | 0.844 |
82 | 88 | 62 | 0.729 | 89 | 79 | 0.924 |
86 | 87 | 68 | 0.786 | 86 | 74 | 0.860 |
73 | 67 | 49 | 0.700 | 70 | 60 | 0.839 |
61 | 78 | 41 | 0.590 | 69 | 53 | 0.815 |
45 | 45 | 32 | 0.711 | 42 | 37 | 0.851 |
54 | 63 | 38 | 0.650 | 58 | 41 | 0.732 |
66 | 74 | 52 | 0.743 | 71 | 58 | 0.847 |
38 | 28 | 20 | 0.606 | 35 | 31 | 0.849 |
76 | 68 | 53 | 0.736 | 72 | 64 | 0.865 |
42 | 35 | 23 | 0.597 | 40 | 32 | 0.780 |
F=0.69 | F=0.84 |
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB021290458A CN1180377C (zh) | 2002-08-29 | 2002-08-29 | 一种对半结构化文档集进行文本挖掘的方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB021290458A CN1180377C (zh) | 2002-08-29 | 2002-08-29 | 一种对半结构化文档集进行文本挖掘的方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1399228A CN1399228A (zh) | 2003-02-26 |
CN1180377C true CN1180377C (zh) | 2004-12-15 |
Family
ID=4746113
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB021290458A Expired - Lifetime CN1180377C (zh) | 2002-08-29 | 2002-08-29 | 一种对半结构化文档集进行文本挖掘的方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1180377C (zh) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SG133421A1 (en) * | 2005-12-13 | 2007-07-30 | Singapore Tech Dynamics Pte | Method and apparatus for an algorithm development environment for solving a class of real-life combinatorial optimization problems |
CN100418086C (zh) * | 2006-08-22 | 2008-09-10 | 北京北大方正电子有限公司 | 一种文字的可变数据排版的方法 |
CN100447793C (zh) * | 2007-01-10 | 2008-12-31 | 苏州大学 | 基于视觉特征的页面查询接口抽取方法 |
CN102436480B (zh) * | 2011-10-15 | 2013-11-06 | 西安交通大学 | 一种面向文本的知识单元关联关系挖掘方法 |
CN104063411B (zh) * | 2013-09-12 | 2016-05-25 | 江苏金鸽网络科技有限公司 | 基于波特五力模型的企业情报收集方法 |
CN107943986B (zh) * | 2017-11-30 | 2022-05-17 | 睿视智觉(深圳)算法技术有限公司 | 一种大数据分析挖掘系统 |
-
2002
- 2002-08-29 CN CNB021290458A patent/CN1180377C/zh not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
CN1399228A (zh) | 2003-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dutta et al. | QROCK: A quick version of the ROCK algorithm for clustering of categorical data | |
Ryang et al. | Indexed list-based high utility pattern mining with utility upper-bound reduction and pattern combination techniques | |
Sohrabi et al. | Efficient colossal pattern mining in high dimensional datasets | |
CN106570128A (zh) | 一种基于关联规则分析的挖掘算法 | |
CN107316049A (zh) | 一种基于半监督自训练的迁移学习分类方法 | |
CN106815369A (zh) | 一种基于Xgboost分类算法的文本分类方法 | |
CN102043851A (zh) | 一种基于频繁项集的多文档自动摘要方法 | |
CN111090811B (zh) | 一种海量新闻热点话题提取方法和系统 | |
CN106529564A (zh) | 一种基于卷积神经网络的食物图像自动分类方法 | |
Lin et al. | High utility pattern mining using the maximal itemset property and lexicographic tree structures | |
CN112287118B (zh) | 事件模式频繁子图挖掘与预测方法 | |
Kim et al. | Efficient mining of high utility pattern with considering of rarity and length | |
CN107291877A (zh) | 一种基于Apriori算法的频繁项集挖掘方法 | |
CN1180377C (zh) | 一种对半结构化文档集进行文本挖掘的方法 | |
Nguyen et al. | Efficient algorithms for mining colossal patterns in high dimensional databases | |
Kim et al. | Average utility driven data analytics on damped windows for intelligent systems with data streams | |
CN103064966B (zh) | 一种从单记录网页中抽取规律噪音的方法 | |
CN108073701A (zh) | 一种挖掘多维时序数据稀有模式的方法 | |
CN101604365B (zh) | 确定计算机恶意程序样本家族数的系统和方法 | |
Liu et al. | Rare itemsets mining algorithm based on RP-Tree and spark framework | |
Şenol et al. | Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması | |
CN113900924B (zh) | 基于tan半朴素贝叶斯网络的软件缺陷预测方法和系统 | |
CN109993231A (zh) | 一种基于频繁项集的多标签分类方法 | |
Li et al. | An Improved Algorithm for Mining Correlation Item Pairs. | |
Moko et al. | Big data and NoSQL databases architecture: a review |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
ASS | Succession or assignment of patent right |
Owner name: BEIDA FANGZHENG GROUP CO. LTD. Free format text: FORMER OWNER: INST. OF COMPUTER SCIENCE + TECHNOLOGY, BEIJING UNIV. Effective date: 20131118 Owner name: BEIJING UNIV. Free format text: FORMER OWNER: BEIDA FANGZHENG TECHN INST. CO., LTD., BEIJING Effective date: 20131118 |
|
C41 | Transfer of patent application or patent right or utility model | ||
COR | Change of bibliographic data |
Free format text: CORRECT: ADDRESS; FROM: 100085 HAIDIAN, BEIJING TO: 100871 HAIDIAN, BEIJING |
|
TR01 | Transfer of patent right |
Effective date of registration: 20131118 Address after: 100871 Beijing the Summer Palace Road, Haidian District, No. 5 Patentee after: Peking University Patentee after: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Address before: 100085, fangzheng building, No. nine, five street, Beijing, Haidian District Patentee before: PEKING University FOUNDER R & D CENTER Patentee before: INST OF Co. SCIENCE & TECHNOL |
|
CX01 | Expiry of patent term |
Granted publication date: 20041215 |
|
CX01 | Expiry of patent term |