CN101673306B - 网页信息查询方法及其系统 - Google Patents
网页信息查询方法及其系统 Download PDFInfo
- Publication number
- CN101673306B CN101673306B CN2009102360570A CN200910236057A CN101673306B CN 101673306 B CN101673306 B CN 101673306B CN 2009102360570 A CN2009102360570 A CN 2009102360570A CN 200910236057 A CN200910236057 A CN 200910236057A CN 101673306 B CN101673306 B CN 101673306B
- Authority
- CN
- China
- Prior art keywords
- sorter
- text
- website
- classification
- queried result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000012549 training Methods 0.000 claims description 136
- 238000000605 extraction Methods 0.000 claims description 27
- 239000000284 extract Substances 0.000 claims description 13
- 230000011218 segmentation Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 230000008878 coupling Effects 0.000 claims description 5
- 238000010168 coupling process Methods 0.000 claims description 5
- 238000005859 coupling reaction Methods 0.000 claims description 5
- 238000012217 deletion Methods 0.000 claims description 5
- 230000037430 deletion Effects 0.000 claims description 5
- 239000000203 mixture Substances 0.000 claims description 5
- 230000000875 corresponding effect Effects 0.000 description 15
- 238000004422 calculation algorithm Methods 0.000 description 12
- 230000007246 mechanism Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- RZVAJINKPMORJF-UHFFFAOYSA-N Acetaminophen Chemical compound CC(=O)NC1=CC=C(O)C=C1 RZVAJINKPMORJF-UHFFFAOYSA-N 0.000 description 2
- 101100452236 Caenorhabditis elegans inf-1 gene Proteins 0.000 description 2
- 101100152865 Danio rerio thraa gene Proteins 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 101150002826 inf2 gene Proteins 0.000 description 2
- 101150117196 tra-1 gene Proteins 0.000 description 2
- 244000097202 Rathbunia alamosensis Species 0.000 description 1
- 235000009776 Rathbunia alamosensis Nutrition 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000000366 juvenile effect Effects 0.000 description 1
- 238000002386 leaching Methods 0.000 description 1
- 238000012067 mathematical method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
特征 | 结果 | 阈值 |
每一行文本密度值 | 选中 | 0.45 |
每一行HTML字节数 | 舍弃 | |
每一行文本长度 | 选中 | x<=30、100<=x<=200 |
前一行文本的是否为正文判断结果 | 选中 | yy和nn组合 |
Claims (16)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102360570A CN101673306B (zh) | 2009-10-19 | 2009-10-19 | 网页信息查询方法及其系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009102360570A CN101673306B (zh) | 2009-10-19 | 2009-10-19 | 网页信息查询方法及其系统 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101673306A CN101673306A (zh) | 2010-03-17 |
CN101673306B true CN101673306B (zh) | 2011-08-24 |
Family
ID=42020529
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009102360570A Active CN101673306B (zh) | 2009-10-19 | 2009-10-19 | 网页信息查询方法及其系统 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101673306B (zh) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609539B (zh) * | 2012-02-16 | 2015-06-10 | 北京搜狗信息服务有限公司 | 一种搜索方法和系统 |
CN103678310B (zh) * | 2012-08-31 | 2018-04-27 | 腾讯科技(深圳)有限公司 | 网页主题的分类方法及装置 |
CN103838744B (zh) * | 2012-11-22 | 2019-01-15 | 百度在线网络技术(北京)有限公司 | 一种查询词需求分析的方法及装置 |
CN103942203A (zh) * | 2013-01-18 | 2014-07-23 | 北大方正集团有限公司 | 一种信息处理方法及主题信息库制作系统 |
CN105468627A (zh) * | 2014-09-04 | 2016-04-06 | 纬创资通股份有限公司 | 屏蔽与过滤网页内容的方法与系统 |
CN105512225A (zh) * | 2015-11-30 | 2016-04-20 | 北大方正集团有限公司 | 一种从网页中提取主要内容的方法及装置 |
CN106951422B (zh) * | 2016-01-07 | 2021-05-28 | 腾讯科技(深圳)有限公司 | 网页训练的方法和装置、搜索意图识别的方法和装置 |
CN107423349A (zh) * | 2017-05-18 | 2017-12-01 | 福建中金在线信息科技有限公司 | 一种全文搜索的方法及系统 |
CN108763200A (zh) * | 2018-05-15 | 2018-11-06 | 达而观信息科技(上海)有限公司 | 中文分词方法及装置 |
-
2009
- 2009-10-19 CN CN2009102360570A patent/CN101673306B/zh active Active
Also Published As
Publication number | Publication date |
---|---|
CN101673306A (zh) | 2010-03-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101673306B (zh) | 网页信息查询方法及其系统 | |
Günther et al. | Word counts and topic models: Automated text analysis methods for digital journalism research | |
CN107577759B (zh) | 用户评论自动推荐方法 | |
CN111143479B (zh) | 基于dbscan聚类算法的知识图谱关系抽取与rest服务可视化融合方法 | |
CN103914478B (zh) | 网页训练方法及系统、网页预测方法及系统 | |
Yao et al. | Research on news keyword extraction technology based on TF-IDF and TextRank | |
CN106156204B (zh) | 文本标签的提取方法和装置 | |
CN101593200B (zh) | 基于关键词频度分析的中文网页分类方法 | |
CN111177591B (zh) | 面向可视化需求的基于知识图谱的Web数据优化方法 | |
CN102929873B (zh) | 一种基于情境搜索提取搜索价值词的方法及装置 | |
CN100595760C (zh) | 一种获取口语词条的方法、装置以及一种输入法系统 | |
CN111190900B (zh) | 一种云计算模式下json数据可视化优化方法 | |
WO2015149533A1 (zh) | 一种基于网页内容分类进行分词处理的方法和装置 | |
KR100974064B1 (ko) | 사용자 맞춤형 정보 제공 시스템 및 그 방법 | |
WO2010014082A1 (en) | Method and apparatus for relating datasets by using semantic vectors and keyword analyses | |
CN110232126B (zh) | 热点挖掘方法及服务器和计算机可读存储介质 | |
CN101556596B (zh) | 一种输入法系统及智能组词的方法 | |
CN107506472B (zh) | 一种学生浏览网页分类方法 | |
CN108875065B (zh) | 一种基于内容的印尼新闻网页推荐方法 | |
CN104199833A (zh) | 一种网络搜索词的聚类方法和聚类装置 | |
CN103678422A (zh) | 网页分类方法和装置、网页分类器的训练方法和装置 | |
JP2006293767A (ja) | 文章分類装置、文章分類方法および分類辞書作成装置 | |
KR101059557B1 (ko) | 정보 검색 방법 및 이를 수행할 수 있는 프로그램이 수록된컴퓨터로 읽을 수 있는 기록 매체 | |
KR100973969B1 (ko) | 매체 편향의 효과를 완화하는 뉴스 서비스 시스템 및 방법 | |
Amini | Interactive learning for text summarization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
ASS | Succession or assignment of patent right |
Owner name: CLOUD COMPUTING INDUSTRIAL TECHNOLOGY INNOVATION A Free format text: FORMER OWNER: INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES Effective date: 20140509 |
|
C41 | Transfer of patent application or patent right or utility model | ||
COR | Change of bibliographic data |
Free format text: CORRECT: ADDRESS; FROM: 100080 HAIDIAN, BEIJING TO: 523808 DONGGUAN, GUANGDONG PROVINCE |
|
TR01 | Transfer of patent right |
Effective date of registration: 20140509 Address after: 14 No. 523808 Keyuan pine in Guangdong Province, Dongguan Songshan Lake high tech Industrial Development Zone Patentee after: Dongguan Cloud Computing Technology Innovation and Cultivation Center, Chinese Academy of Sciences Address before: 100080 Haidian District, Zhongguancun Academy of Sciences, South Road, No. 6, No. Patentee before: Institute of Computing Technology, Chinese Academy of Sciences |
|
ASS | Succession or assignment of patent right |
Owner name: GUANGDONG ZHONGKE YUNFU VENTURE CAPITAL CO., LTD. Free format text: FORMER OWNER: CLOUD COMPUTING INDUSTRIAL TECHNOLOGY INNOVATION AND INCUBATION CENTER CHINESE ACADEMY OF SCIENCES DONGGUAN Effective date: 20150818 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TR01 | Transfer of patent right |
Effective date of registration: 20150818 Address after: 523808 Guangdong province Dongguan Songshan Lake high tech Industrial Development Zone No. 14 Keyuan pine floor 3 floor Room 305 Patentee after: Guangdong Zhongke Yunfu Venture Investment Company Limited Address before: 14 No. 523808 Keyuan pine in Guangdong Province, Dongguan Songshan Lake high tech Industrial Development Zone Patentee before: Dongguan Cloud Computing Technology Innovation and Cultivation Center, Chinese Academy of Sciences |
|
TR01 | Transfer of patent right |
Effective date of registration: 20170811 Address after: 22 No. 334000 Jiangxi city of Shangrao province Xinzhou District Chaoyang Road Patentee after: Shangrao Zhongke letter cloud industry information technology Co., Ltd. Address before: 523808 Guangdong province Dongguan Songshan Lake high tech Industrial Development Zone No. 14 Keyuan pine floor 3 floor Room 305 Patentee before: Guangdong Zhongke Yunfu Venture Investment Company Limited |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210817 Address after: 523000 room 2304, building 1, No. 1, Kehui Road, Songshanhu Park, Dongguan City, Guangdong Province Patentee after: Guangdong Zhongke Yunfu Venture Investment Co.,Ltd. Address before: No.22 Chaoyang Avenue, Xinzhou District, Shangrao City, Jiangxi Province Patentee before: Shangrao Zhongke letter cloud industry information technology Co.,Ltd. |