CN103970733B - 一种基于图结构的中文新词识别方法 - Google Patents
一种基于图结构的中文新词识别方法 Download PDFInfo
- Publication number
- CN103970733B CN103970733B CN201410143875.7A CN201410143875A CN103970733B CN 103970733 B CN103970733 B CN 103970733B CN 201410143875 A CN201410143875 A CN 201410143875A CN 103970733 B CN103970733 B CN 103970733B
- Authority
- CN
- China
- Prior art keywords
- neologisms
- alternative
- word
- occurrence rate
- backward
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 206010028916 Neologism Diseases 0.000 claims abstract description 77
- 238000010586 diagram Methods 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 238000009412 basement excavation Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 239000013585 weight reducing agent Substances 0.000 description 1
Landscapes
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
基于规则 | 基于统计 | 基于图 | |
短词准确率 | 95% | 82% | 85% |
短词召回率 | 7% | 86% | 88% |
长词准确率 | 0% | 0% | 100% |
长词召回率 | 0% | 0% | 95% |
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410143875.7A CN103970733B (zh) | 2014-04-10 | 2014-04-10 | 一种基于图结构的中文新词识别方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410143875.7A CN103970733B (zh) | 2014-04-10 | 2014-04-10 | 一种基于图结构的中文新词识别方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103970733A CN103970733A (zh) | 2014-08-06 |
CN103970733B true CN103970733B (zh) | 2017-07-14 |
Family
ID=51240251
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410143875.7A Expired - Fee Related CN103970733B (zh) | 2014-04-10 | 2014-04-10 | 一种基于图结构的中文新词识别方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103970733B (zh) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108875040B (zh) * | 2015-10-27 | 2020-08-18 | 上海智臻智能网络科技股份有限公司 | 词典更新方法及计算机可读存储介质 |
CN105740236B (zh) * | 2016-01-29 | 2018-09-07 | 中国科学院自动化研究所 | 结合写作特征和序列特征的中文情感新词识别方法和系统 |
CN106202051B (zh) * | 2016-07-19 | 2019-01-29 | 华南理工大学 | 基于有向有权图发现新词的方法 |
CN106610937A (zh) * | 2016-09-19 | 2017-05-03 | 四川用联信息技术有限公司 | 一种基于信息论的中文自动分词算法 |
CN106598940A (zh) * | 2016-11-01 | 2017-04-26 | 四川用联信息技术有限公司 | 基于全局优化关键词质量的文本相似度求解算法 |
CN106598941A (zh) * | 2016-11-01 | 2017-04-26 | 四川用联信息技术有限公司 | 一种全局优化文本关键词质量的算法 |
US10831803B2 (en) * | 2018-07-26 | 2020-11-10 | Beijing Jingdong Shangke Information Technology Co., Ltd. | System and method for true product word recognition |
CN109522396B (zh) * | 2018-10-22 | 2020-12-25 | 中国船舶工业综合技术经济研究院 | 一种面向国防科技领域的知识处理方法及系统 |
CN110807322B (zh) * | 2019-09-19 | 2024-03-01 | 平安科技(深圳)有限公司 | 基于信息熵识别新词的方法、装置、服务器及存储介质 |
CN113157929A (zh) * | 2020-12-30 | 2021-07-23 | 龙马智芯(珠海横琴)科技有限公司 | 一种新词挖掘方法、装置、服务器及计算机可读存储介质 |
CN115879515B (zh) * | 2023-02-20 | 2023-05-12 | 江西财经大学 | 文档网络主题建模方法、变分邻域编码器、终端及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002056009A (ja) * | 2000-05-29 | 2002-02-20 | Fuji Xerox Co Ltd | 文書分類方法および装置 |
CN102375842A (zh) * | 2010-08-20 | 2012-03-14 | 姚尹雄 | 面向领域整体的关键词集的评价和提取方法 |
CN103593427A (zh) * | 2013-11-07 | 2014-02-19 | 清华大学 | 新词搜索方法及系统 |
CN103678656A (zh) * | 2013-12-23 | 2014-03-26 | 合肥工业大学 | 一种基于重复字串的微博新词非监督自动抽取方法 |
-
2014
- 2014-04-10 CN CN201410143875.7A patent/CN103970733B/zh not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002056009A (ja) * | 2000-05-29 | 2002-02-20 | Fuji Xerox Co Ltd | 文書分類方法および装置 |
CN102375842A (zh) * | 2010-08-20 | 2012-03-14 | 姚尹雄 | 面向领域整体的关键词集的评价和提取方法 |
CN103593427A (zh) * | 2013-11-07 | 2014-02-19 | 清华大学 | 新词搜索方法及系统 |
CN103678656A (zh) * | 2013-12-23 | 2014-03-26 | 合肥工业大学 | 一种基于重复字串的微博新词非监督自动抽取方法 |
Also Published As
Publication number | Publication date |
---|---|
CN103970733A (zh) | 2014-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103970733B (zh) | 一种基于图结构的中文新词识别方法 | |
CN110781317B (zh) | 事件图谱的构建方法、装置及电子设备 | |
CN106991092B (zh) | 基于大数据挖掘相似裁判文书的方法和设备 | |
CN109376963B (zh) | 一种基于神经网络的刑事案件罪名法条联合预测方法 | |
Li et al. | Fine-grained location extraction from tweets with temporal awareness | |
CN103336766B (zh) | 短文本垃圾识别以及建模方法和装置 | |
CN103853738B (zh) | 一种网页信息相关地域的识别方法 | |
CN107729468A (zh) | 基于深度学习的答案抽取方法及系统 | |
CN103324745A (zh) | 基于贝叶斯模型的文本垃圾识别方法和系统 | |
CN108875040A (zh) | 词典更新方法及计算机可读存储介质 | |
CN107748745B (zh) | 一种企业名称关键字提取方法 | |
CN104008166A (zh) | 一种基于形态和语义相似度的对话短文本聚类方法 | |
CN102722709A (zh) | 一种垃圾图片识别方法和装置 | |
WO2016177069A1 (zh) | 一种管理方法、装置、垃圾短信监控系统及计算机存储介质 | |
CN103984943A (zh) | 一种基于贝叶斯概率框架的场景文本识别方法 | |
CN110287292B (zh) | 一种裁判量刑偏离度预测方法及装置 | |
CN110705292B (zh) | 一种基于知识库和深度学习的实体名称提取方法 | |
CN109271640A (zh) | 文本信息的地域属性识别方法及装置、电子设备 | |
CN109145287A (zh) | 印尼语单词检错纠错方法及系统 | |
CN103902733A (zh) | 基于疑问词扩展的信息检索方法 | |
CN110069769A (zh) | 应用标签生成方法、装置及存储设备 | |
CN103324641B (zh) | 信息记录推荐方法和装置 | |
CN107291685B (zh) | 语义识别方法和语义识别系统 | |
CN106874762A (zh) | 基于api依赖关系图的安卓恶意代码检测方法 | |
CN105224603A (zh) | 训练语料获取方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
ASS | Succession or assignment of patent right |
Owner name: BEIJING UNIV. Effective date: 20150728 Owner name: CHINA INFORMATION TECHNOLOGY SECURITY EVALUATION C Free format text: FORMER OWNER: BEIJING UNIV. Effective date: 20150728 |
|
C41 | Transfer of patent application or patent right or utility model | ||
C53 | Correction of patent for invention or patent application | ||
CB03 | Change of inventor or designer information |
Inventor after: Chen Haiqiang Inventor after: Cheng Junjun Inventor after: Zhou Xin Inventor after: Wu Jiayi Inventor after: Chen Wei Inventor after: Wang Tengjiao Inventor before: Wu Jiayi Inventor before: Chen Wei Inventor before: Wang Tengjiao |
|
COR | Change of bibliographic data |
Free format text: CORRECT: INVENTOR; FROM: WU JIAYI CHEN WEI WANG TENGJIAO TO: CHEN HAIQIANG CHENG JUNJUN ZHOU XIN WU JIAYI CHEN WEI WANG TENGJIAO |
|
TA01 | Transfer of patent application right |
Effective date of registration: 20150728 Address after: 100085 Building No. 8, No. 1 West Road, Beijing, Haidian District Applicant after: China Information Technology Security Evaluation Center Applicant after: Peking University Address before: 100871 Haidian District the Summer Palace Road,, No. 5, Peking University Applicant before: Peking University |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170714 Termination date: 20180410 |