CN103914534B - 基于专家系统url分类知识库的文本内容分类方法 - Google Patents
基于专家系统url分类知识库的文本内容分类方法 Download PDFInfo
- Publication number
- CN103914534B CN103914534B CN201410127141.XA CN201410127141A CN103914534B CN 103914534 B CN103914534 B CN 103914534B CN 201410127141 A CN201410127141 A CN 201410127141A CN 103914534 B CN103914534 B CN 103914534B
- Authority
- CN
- China
- Prior art keywords
- knowledge
- url
- reasoning
- content
- domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000008878 coupling Effects 0.000 claims description 24
- 238000010168 coupling process Methods 0.000 claims description 24
- 238000005859 coupling reaction Methods 0.000 claims description 24
- 238000004806 packaging method and process Methods 0.000 claims description 14
- 238000004458 analytical method Methods 0.000 claims description 13
- 238000010276 construction Methods 0.000 claims description 11
- 238000004140 cleaning Methods 0.000 claims description 7
- 238000012790 confirmation Methods 0.000 claims description 6
- 238000012217 deletion Methods 0.000 claims description 4
- 230000037430 deletion Effects 0.000 claims description 4
- 244000097202 Rathbunia alamosensis Species 0.000 description 4
- 235000009776 Rathbunia alamosensis Nutrition 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 244000025254 Cannabis sativa Species 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000013011 mating Effects 0.000 description 1
- 238000004611 spectroscopical analysis Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
索引值 | Hash列表中的“完整URL”内容分类规则 | 类别 | 置信度 |
0 | launcher.warcraftchina.com/2.0/?locale=zh-CN | 网络游戏 | 3.15% |
1 | www.222tk.com/ | 彩票 | 2.87% |
2 | street.yoka.com/clockbeauty/ | 时尚 | 2.45% |
3 | 3g.eastmoney.com/Money.aspx | 财经 | 1.67% |
4 | house.lsfc.net.cn/sellinfo.asp?id=1097356 | 房产 | 1.54% |
…… |
索引值 | Hash列表中的“一级域名”内容分类规则 | 置信度 |
0 | Entry=sina.com.cn | 4.32% |
1 | Entry=sohu.com | 3.98% |
2 | Entry=ifeng.com | 3.45% |
3 | Entry=sina.cn | 2.65% |
4 | Entry=qidian.cn | 2.14% |
…… |
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410127141.XA CN103914534B (zh) | 2014-03-31 | 2014-03-31 | 基于专家系统url分类知识库的文本内容分类方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410127141.XA CN103914534B (zh) | 2014-03-31 | 2014-03-31 | 基于专家系统url分类知识库的文本内容分类方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103914534A CN103914534A (zh) | 2014-07-09 |
CN103914534B true CN103914534B (zh) | 2017-03-15 |
Family
ID=51040214
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410127141.XA Active CN103914534B (zh) | 2014-03-31 | 2014-03-31 | 基于专家系统url分类知识库的文本内容分类方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103914534B (zh) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105045782A (zh) * | 2014-11-14 | 2015-11-11 | 国家电网公司 | 一种铁磁谐振故障知识库构建方法 |
CN104820674B (zh) * | 2015-04-02 | 2018-04-27 | 北京网康科技有限公司 | 一种网页分类方法及装置 |
CN107257390B (zh) * | 2017-05-27 | 2020-10-09 | 北京思特奇信息技术股份有限公司 | 一种url地址的解析方法和系统 |
CN108197638B (zh) * | 2017-12-12 | 2020-03-20 | 阿里巴巴集团控股有限公司 | 对待评估样本进行分类的方法及装置 |
CN109522461B (zh) * | 2018-10-08 | 2021-02-05 | 厦门快商通信息技术有限公司 | 基于正则表达式的url清洗方法及系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7590707B2 (en) * | 2006-08-07 | 2009-09-15 | Webroot Software, Inc. | Method and system for identifying network addresses associated with suspect network destinations |
US8307431B2 (en) * | 2008-05-30 | 2012-11-06 | At&T Intellectual Property I, L.P. | Method and apparatus for identifying phishing websites in network traffic using generated regular expressions |
CN102819591A (zh) * | 2012-08-07 | 2012-12-12 | 北京网康科技有限公司 | 一种基于内容的网页分类方法及系统 |
CN102955810A (zh) * | 2011-08-26 | 2013-03-06 | 中国移动通信集团公司 | 一种网页分类方法和设备 |
-
2014
- 2014-03-31 CN CN201410127141.XA patent/CN103914534B/zh active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7590707B2 (en) * | 2006-08-07 | 2009-09-15 | Webroot Software, Inc. | Method and system for identifying network addresses associated with suspect network destinations |
US8307431B2 (en) * | 2008-05-30 | 2012-11-06 | At&T Intellectual Property I, L.P. | Method and apparatus for identifying phishing websites in network traffic using generated regular expressions |
CN102955810A (zh) * | 2011-08-26 | 2013-03-06 | 中国移动通信集团公司 | 一种网页分类方法和设备 |
CN102819591A (zh) * | 2012-08-07 | 2012-12-12 | 北京网康科技有限公司 | 一种基于内容的网页分类方法及系统 |
Non-Patent Citations (3)
Title |
---|
"Fast Webpage Classification Using URL Features";Min-Yen Kan et al.;《Proceedings of 14th ACM international conference on Information and knowledge management》;20051031;全文 * |
"基于URL主题的查询分类方法";张宇等;《计算机研究与发展》;20120813;第49卷(第6期);全文 * |
"基于域名信息的钓鱼URL探测";郑礼雄等;《计算机工程》;20120531;第38卷(第10期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN103914534A (zh) | 2014-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103902703B (zh) | 基于移动互联网访问的文本内容分类方法 | |
CN103914534B (zh) | 基于专家系统url分类知识库的文本内容分类方法 | |
CN108364028A (zh) | 一种基于深度学习的互联网网站自动分类方法 | |
CN103218431B (zh) | 一种能识别网页信息自动采集的系统 | |
CN103546326B (zh) | 一种网站流量统计的方法 | |
CN101820366B (zh) | 一种基于预取的钓鱼网页检测方法 | |
CN102831234B (zh) | 基于新闻内容和主题特征的个性化新闻推荐装置和方法 | |
CN103914478B (zh) | 网页训练方法及系统、网页预测方法及系统 | |
CN105138558B (zh) | 基于用户访问内容的实时个性化信息采集方法 | |
CN102819591B (zh) | 一种基于内容的网页分类方法及系统 | |
CN103955842B (zh) | 一种面向大规模媒体数据的在线广告推荐系统及方法 | |
CN107220295A (zh) | 一种人民矛盾调解案例搜索和调解策略推荐方法 | |
CN107169001A (zh) | 一种基于众包反馈和主动学习的文本分类模型优化方法 | |
CN104166668A (zh) | 基于folfm模型的新闻推荐系统及方法 | |
CN106156372B (zh) | 一种互联网网站的分类方法及装置 | |
CN106383887A (zh) | 一种环保新闻数据采集和推荐展示的方法及系统 | |
CN104077407B (zh) | 一种智能数据搜索系统及方法 | |
CN103268350A (zh) | 一种互联网舆情信息监测系统及监测方法 | |
CN107341183A (zh) | 一种基于暗网网站综合特征的网站分类方法 | |
CN105468744A (zh) | 一种实现税务舆情分析和全文检索的大数据平台 | |
CN103838886A (zh) | 基于代表词知识库的文本内容分类方法 | |
CN104809252A (zh) | 互联网数据提取系统 | |
CN103942268A (zh) | 搜索与应用相结合的方法、设备以及应用接口 | |
CN108733791A (zh) | 网络事件检测方法 | |
CN102043811A (zh) | 一种医疗信息的评估方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20151228 Address after: 110020 Shenyang, Liaoning, Tiexi District, No. nine small road 12 3-7-1 Applicant after: Guo Lei Address before: 110043, Dadong Road, Dadong District, Liaoning, 134, two gate, two floor, Shenyang Applicant before: LIAONING SIWEI SCIENCE AND TECHNOLOGY DEVELOPMENT CO., LTD. |
|
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200119 Address after: 525200 Yunjie Gem Village, Shanmei Street, Gaozhou City, Maoming City, Guangdong Province Patentee after: Chen Kun Address before: 110020, No. 12, No. nine, Tiexi Road, Shenyang District, Liaoning, 3-7-1 Patentee before: Guo Lei |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200420 Address after: 200120 unit B, C, D, e, floor 4, building 3, No. 100, Lane 1505, Zuchongzhi Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai Patentee after: SHANGHAI PUDONG SOFTWARE PARK INFORMATION TECHNOLOGY Co.,Ltd. Address before: 525200 Yunjie Gem Village, Shanmei Street, Gaozhou City, Maoming City, Guangdong Province Patentee before: Chen Kun |
|
TR01 | Transfer of patent right |