CN109643315A - 基于结构化网络知识自动生成中文本体库的方法、系统、计算机设备和计算机可读介质 - Google Patents
基于结构化网络知识自动生成中文本体库的方法、系统、计算机设备和计算机可读介质 Download PDFInfo
- Publication number
- CN109643315A CN109643315A CN201780046326.XA CN201780046326A CN109643315A CN 109643315 A CN109643315 A CN 109643315A CN 201780046326 A CN201780046326 A CN 201780046326A CN 109643315 A CN109643315 A CN 109643315A
- Authority
- CN
- China
- Prior art keywords
- concept
- interest
- chinese text
- text corpus
- knowledge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 239000013598 vector Substances 0.000 claims description 41
- 230000009193 crawling Effects 0.000 claims description 17
- 230000003068 static effect Effects 0.000 claims description 15
- 238000001914 filtration Methods 0.000 claims description 14
- 238000012986 modification Methods 0.000 claims description 9
- 230000004048 modification Effects 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 6
- 238000012800 visualization Methods 0.000 claims description 6
- 238000010845 search algorithm Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 7
- 238000003058 natural language processing Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种基于结构化网络知识的自动生成中文本体库的方法、系统、计算机设备和计算机可读介质。所述方法包括步骤:从结构化知识网络抓取结构化知识,其中结构化知识包括至少一个关注概念用于所述自动中文本体库的生成;过滤无关的链接;提取有关所关注概念的知识;发现所述关注概念的相关联概念;基于余弦相似性度量推断所述关注概念及其相关联概念之间的语义相关性;并且存储推断出的所述语义相关性数据。本发明提供的更有效率的自动中文本体库生成的系统和方法,以应对快速发展的数据世界并迎合数据用户的需求。
Description
PCT国内申请,说明书已公开。
Claims (43)
- PCT国内申请,权利要求书已公开。
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
HK16109078.8 | 2016-07-29 | ||
HK16109078.8A HK1220319A2 (zh) | 2016-07-29 | 2016-07-29 | 基於結構化網絡知識的自動中文本體庫建構方法、系統及計算機可讀介質 |
PCT/CN2017/094881 WO2018019289A1 (zh) | 2016-07-29 | 2017-07-28 | 基于结构化网络知识自动生成中文本体库的方法、系统、计算机设备和计算机可读介质 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109643315A true CN109643315A (zh) | 2019-04-16 |
CN109643315B CN109643315B (zh) | 2024-05-07 |
Family
ID=58633644
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201780046326.XA Active CN109643315B (zh) | 2016-07-29 | 2017-07-28 | 基于结构化网络知识自动生成中文本体库的方法、系统、计算机设备和计算机可读介质 |
Country Status (4)
Country | Link |
---|---|
CN (1) | CN109643315B (zh) |
HK (1) | HK1220319A2 (zh) |
TW (1) | TW201804345A (zh) |
WO (1) | WO2018019289A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111783422A (zh) * | 2020-06-24 | 2020-10-16 | 北京字节跳动网络技术有限公司 | 一种文本序列生成方法、装置、设备和介质 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110892399B (zh) * | 2017-06-16 | 2023-05-09 | 爱思唯尔有限公司 | 自动生成主题内容摘要的系统和方法 |
CN111859975B (zh) * | 2019-04-22 | 2024-08-16 | 广东小天才科技有限公司 | 一种扩充样本语料的语料正则式的方法和系统 |
CN110502640A (zh) * | 2019-07-30 | 2019-11-26 | 江南大学 | 一种基于建构的概念词义发展脉络的提取方法 |
CN110851612B (zh) * | 2019-08-29 | 2023-08-18 | 国家计算机网络与信息安全管理中心 | 基于百科知识的移动应用知识图谱复合型补全方法及装置 |
CN115658931B (zh) * | 2022-12-27 | 2023-04-07 | 清华大学 | 百科知识图谱动态更新方法、装置、设备及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1728134A (zh) * | 2004-07-30 | 2006-02-01 | 国际商业机器公司 | 基于超文本的多语言网络信息搜索方法和系统 |
CN102609512A (zh) * | 2012-02-07 | 2012-07-25 | 北京中机科海科技发展有限公司 | 异构信息知识挖掘与可视化分析系统及方法 |
CN105518661A (zh) * | 2013-08-12 | 2016-04-20 | 微软技术许可有限责任公司 | 经由挖掘的超链接文本的片段来浏览图像 |
US20160132484A1 (en) * | 2014-11-10 | 2016-05-12 | Oracle International Corporation | Automatic generation of n-grams and concept relations from linguistic input data |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150019174A1 (en) * | 2013-07-09 | 2015-01-15 | Honeywell International Inc. | Ontology driven building audit system |
US9672197B2 (en) * | 2014-10-14 | 2017-06-06 | Sugarcrm Inc. | Universal rebranding engine |
CN105488105B (zh) * | 2015-11-19 | 2019-11-05 | 百度在线网络技术(北京)有限公司 | 信息提取模板的建立方法、知识数据的处理方法和装置 |
CN105843965B (zh) * | 2016-04-20 | 2019-06-04 | 广东精点数据科技股份有限公司 | 一种基于url主题分类的深层网络爬虫表单填充方法和装置 |
-
2016
- 2016-07-29 HK HK16109078.8A patent/HK1220319A2/zh not_active IP Right Cessation
-
2017
- 2017-07-26 TW TW106125119A patent/TW201804345A/zh unknown
- 2017-07-28 CN CN201780046326.XA patent/CN109643315B/zh active Active
- 2017-07-28 WO PCT/CN2017/094881 patent/WO2018019289A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1728134A (zh) * | 2004-07-30 | 2006-02-01 | 国际商业机器公司 | 基于超文本的多语言网络信息搜索方法和系统 |
CN102609512A (zh) * | 2012-02-07 | 2012-07-25 | 北京中机科海科技发展有限公司 | 异构信息知识挖掘与可视化分析系统及方法 |
CN105518661A (zh) * | 2013-08-12 | 2016-04-20 | 微软技术许可有限责任公司 | 经由挖掘的超链接文本的片段来浏览图像 |
US20160132484A1 (en) * | 2014-11-10 | 2016-05-12 | Oracle International Corporation | Automatic generation of n-grams and concept relations from linguistic input data |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111783422A (zh) * | 2020-06-24 | 2020-10-16 | 北京字节跳动网络技术有限公司 | 一种文本序列生成方法、装置、设备和介质 |
CN111783422B (zh) * | 2020-06-24 | 2022-03-04 | 北京字节跳动网络技术有限公司 | 一种文本序列生成方法、装置、设备和介质 |
US11669679B2 (en) | 2020-06-24 | 2023-06-06 | Beijing Byledance Network Technology Co., Ltd. | Text sequence generating method and apparatus, device and medium |
Also Published As
Publication number | Publication date |
---|---|
TW201804345A (zh) | 2018-02-01 |
CN109643315B (zh) | 2024-05-07 |
WO2018019289A1 (zh) | 2018-02-01 |
HK1220319A2 (zh) | 2017-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109643315B (zh) | 基于结构化网络知识自动生成中文本体库的方法、系统、计算机设备和计算机可读介质 | |
US20090070322A1 (en) | Browsing knowledge on the basis of semantic relations | |
US20130138586A1 (en) | Service goal interpreting apparatus and method for goal-driven semantic service discovery | |
AU2019201531A1 (en) | An in-app conversational question answering assistant for product help | |
Dong et al. | A survey in semantic search technologies | |
Cao et al. | Recommending questions using the mdl-based tree cut model | |
CN101393565A (zh) | 基于本体的面向虚拟博物馆的搜索方法 | |
EP2192503A1 (en) | Optimised tag based searching | |
Oliveira et al. | Semantic annotation tools survey | |
CN114117242A (zh) | 数据查询方法和装置、计算机设备、存储介质 | |
Babekr et al. | Personalized semantic retrieval and summarization of web based documents | |
Afuan et al. | A new approach in query expansion methods for improving information retrieval | |
WO2012091541A1 (en) | A semantic web constructor system and a method thereof | |
Kramár et al. | Disambiguating search by leveraging a social context based on the stream of user’s activity | |
WO2009035871A1 (en) | Browsing knowledge on the basis of semantic relations | |
KR100659370B1 (ko) | 시소러스 매칭에 의한 문서 db 형성 방법 및 정보검색방법 | |
CN112100500A (zh) | 范例学习驱动的内容关联网站发掘方法 | |
Hsu et al. | Using domain ontology to implement a frequently asked questions system | |
Enhong et al. | Semi-structured data extraction and schema knowledge mining | |
Popescu et al. | Using semantic commonsense resources in image retrieval | |
Yokoo et al. | Semantics-based news delivering service | |
Sharma et al. | Improved stemming approach used for text processing in information retrieval system | |
Annalakshmi et al. | Structuring of Web Pages using XML Framework for Information Filtering | |
Khusro et al. | Issues and challenges in book information retrieval | |
Menemencioğlu et al. | A Review on Semantic Text and Multimedia Retrieval and Recent Trends |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |