CN1180377C - 一种对半结构化文档集进行文本挖掘的方法 - Google Patents
一种对半结构化文档集进行文本挖掘的方法 Download PDFInfo
- Publication number
- CN1180377C CN1180377C CNB021290458A CN02129045A CN1180377C CN 1180377 C CN1180377 C CN 1180377C CN B021290458 A CNB021290458 A CN B021290458A CN 02129045 A CN02129045 A CN 02129045A CN 1180377 C CN1180377 C CN 1180377C
- Authority
- CN
- China
- Prior art keywords
- node
- document
- semi
- information
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
基于向量空间模型TFIDF | 基于结构链接向量模型 | |||||
Mi | Nj | M(ni,j) | M(F(i,j)) | Nj | M(ni,j) | M(F(i,j)) |
63 | 60 | 37 | 0.602 | 59 | 52 | 0.852 |
76 | 69 | 53 | 0.731 | 71 | 62 | 0.844 |
82 | 88 | 62 | 0.729 | 89 | 79 | 0.924 |
86 | 87 | 68 | 0.786 | 86 | 74 | 0.860 |
73 | 67 | 49 | 0.700 | 70 | 60 | 0.839 |
61 | 78 | 41 | 0.590 | 69 | 53 | 0.815 |
45 | 45 | 32 | 0.711 | 42 | 37 | 0.851 |
54 | 63 | 38 | 0.650 | 58 | 41 | 0.732 |
66 | 74 | 52 | 0.743 | 71 | 58 | 0.847 |
38 | 28 | 20 | 0.606 | 35 | 31 | 0.849 |
76 | 68 | 53 | 0.736 | 72 | 64 | 0.865 |
42 | 35 | 23 | 0.597 | 40 | 32 | 0.780 |
F=0.69 | F=0.84 |
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB021290458A CN1180377C (zh) | 2002-08-29 | 2002-08-29 | 一种对半结构化文档集进行文本挖掘的方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB021290458A CN1180377C (zh) | 2002-08-29 | 2002-08-29 | 一种对半结构化文档集进行文本挖掘的方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1399228A CN1399228A (zh) | 2003-02-26 |
CN1180377C true CN1180377C (zh) | 2004-12-15 |
Family
ID=4746113
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB021290458A Expired - Lifetime CN1180377C (zh) | 2002-08-29 | 2002-08-29 | 一种对半结构化文档集进行文本挖掘的方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1180377C (zh) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SG133421A1 (en) * | 2005-12-13 | 2007-07-30 | Singapore Tech Dynamics Pte | Method and apparatus for an algorithm development environment for solving a class of real-life combinatorial optimization problems |
CN100418086C (zh) * | 2006-08-22 | 2008-09-10 | 北京北大方正电子有限公司 | 一种文字的可变数据排版的方法 |
CN100447793C (zh) * | 2007-01-10 | 2008-12-31 | 苏州大学 | 基于视觉特征的页面查询接口抽取方法 |
CN102436480B (zh) * | 2011-10-15 | 2013-11-06 | 西安交通大学 | 一种面向文本的知识单元关联关系挖掘方法 |
CN104063411B (zh) * | 2013-09-12 | 2016-05-25 | 江苏金鸽网络科技有限公司 | 基于波特五力模型的企业情报收集方法 |
CN107943986B (zh) * | 2017-11-30 | 2022-05-17 | 睿视智觉(深圳)算法技术有限公司 | 一种大数据分析挖掘系统 |
-
2002
- 2002-08-29 CN CNB021290458A patent/CN1180377C/zh not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
CN1399228A (zh) | 2003-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ding et al. | Research on data stream clustering algorithms | |
CN109359172B (zh) | 一种基于图划分的实体对齐优化方法 | |
CN106815369A (zh) | 一种基于Xgboost分类算法的文本分类方法 | |
CN111597347A (zh) | 知识嵌入的缺陷报告重构方法及装置 | |
Nam et al. | Efficient approach for damped window-based high utility pattern mining with list structure | |
CN109325019A (zh) | 数据关联关系网络构建方法 | |
Nguyen et al. | Efficient algorithms for mining colossal patterns in high dimensional databases | |
Wu et al. | Generalized association rule mining using an efficient data structure | |
CN103544186A (zh) | 挖掘图片中的主题关键词的方法和设备 | |
CN103123685B (zh) | 文本模式识别方法 | |
CN1180377C (zh) | 一种对半结构化文档集进行文本挖掘的方法 | |
CN115248863A (zh) | 基于知识图谱的油气地质评价方法及系统 | |
Yun et al. | An efficient approach for mining weighted approximate closed frequent patterns considering noise constraints | |
Bifet et al. | Mining adaptively frequent closed unlabeled rooted trees in data streams | |
CN102541935A (zh) | 一种新的基于特征向量的中文Web文档表示方法 | |
CN105653567A (zh) | 一种文本序列数据中快速查找特征字符串的方法 | |
CN1766871A (zh) | 基于上下文的半结构化数据语义提取的处理方法 | |
CN111026862A (zh) | 一种基于形式概念分析技术的增量式实体摘要方法 | |
Nguyen et al. | Graph mining based on a data partitioning approach | |
Wang et al. | Closed inter-sequence pattern mining | |
CN112231438A (zh) | 一种闭项集及生成子挖掘方法及装置 | |
CN113900924B (zh) | 基于tan半朴素贝叶斯网络的软件缺陷预测方法和系统 | |
CN111061884B (zh) | 一种基于DeepDive技术构建K12教育知识图谱的方法 | |
Nadimi-Shahraki et al. | A new method for mining maximal frequent itemsets | |
CN118377854B (zh) | 一种创新全链条科技情报服务集成方法及平台 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
ASS | Succession or assignment of patent right |
Owner name: BEIDA FANGZHENG GROUP CO. LTD. Free format text: FORMER OWNER: INST. OF COMPUTER SCIENCE + TECHNOLOGY, BEIJING UNIV. Effective date: 20131118 Owner name: BEIJING UNIV. Free format text: FORMER OWNER: BEIDA FANGZHENG TECHN INST. CO., LTD., BEIJING Effective date: 20131118 |
|
C41 | Transfer of patent application or patent right or utility model | ||
COR | Change of bibliographic data |
Free format text: CORRECT: ADDRESS; FROM: 100085 HAIDIAN, BEIJING TO: 100871 HAIDIAN, BEIJING |
|
TR01 | Transfer of patent right |
Effective date of registration: 20131118 Address after: 100871 Beijing the Summer Palace Road, Haidian District, No. 5 Patentee after: Peking University Patentee after: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd. Address before: 100085, fangzheng building, No. nine, five street, Beijing, Haidian District Patentee before: PEKING University FOUNDER R & D CENTER Patentee before: INST OF Co. SCIENCE & TECHNOL |
|
CX01 | Expiry of patent term | ||
CX01 | Expiry of patent term |
Granted publication date: 20041215 |