CN106650803A - 一种计算字符串间相似度的方法及装置 - Google Patents
一种计算字符串间相似度的方法及装置 Download PDFInfo
- Publication number
- CN106650803A CN106650803A CN201611130125.1A CN201611130125A CN106650803A CN 106650803 A CN106650803 A CN 106650803A CN 201611130125 A CN201611130125 A CN 201611130125A CN 106650803 A CN106650803 A CN 106650803A
- Authority
- CN
- China
- Prior art keywords
- weight
- character string
- vocabulary
- sequence
- core
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000011218 segmentation Effects 0.000 claims abstract description 32
- 230000003203 everyday effect Effects 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 10
- 238000004364 calculation method Methods 0.000 abstract description 10
- 238000000205 computational method Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 3
- 239000013598 vector Substances 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611130125.1A CN106650803B (zh) | 2016-12-09 | 2016-12-09 | 一种计算字符串间相似度的方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611130125.1A CN106650803B (zh) | 2016-12-09 | 2016-12-09 | 一种计算字符串间相似度的方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106650803A true CN106650803A (zh) | 2017-05-10 |
CN106650803B CN106650803B (zh) | 2019-06-18 |
Family
ID=58824810
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611130125.1A Active CN106650803B (zh) | 2016-12-09 | 2016-12-09 | 一种计算字符串间相似度的方法及装置 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106650803B (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107273359A (zh) * | 2017-06-20 | 2017-10-20 | 北京四海心通科技有限公司 | 一种文本相似度确定方法 |
CN108681535A (zh) * | 2018-04-11 | 2018-10-19 | 广州视源电子科技股份有限公司 | 候选词评估方法、装置、计算机设备和存储介质 |
CN109165326A (zh) * | 2018-08-16 | 2019-01-08 | 蜜小蜂智慧(北京)科技有限公司 | 一种字符串匹配方法及装置 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101826099A (zh) * | 2010-02-04 | 2010-09-08 | 蓝盾信息安全技术股份有限公司 | 一种相似文档识别、文档扩散度确定的方法及系统 |
CN102955857A (zh) * | 2012-11-09 | 2013-03-06 | 北京航空航天大学 | 一种搜索引擎中基于类中心压缩变换的文本聚类方法 |
CN102982291A (zh) * | 2012-11-05 | 2013-03-20 | 北京奇虎科技有限公司 | 可信文件数字签名的获取方法及装置 |
CN102184169B (zh) * | 2011-04-20 | 2013-06-19 | 北京百度网讯科技有限公司 | 用于确定字符串信息间相似度信息的方法、装置和设备 |
CN103207905A (zh) * | 2013-03-28 | 2013-07-17 | 大连理工大学 | 一种基于目标文本的计算文本相似度的方法 |
CN104008166A (zh) * | 2014-05-30 | 2014-08-27 | 华东师范大学 | 一种基于形态和语义相似度的对话短文本聚类方法 |
CN104778171A (zh) * | 2014-01-10 | 2015-07-15 | 携程计算机技术(上海)有限公司 | 字符串匹配系统及方法 |
CN105512480A (zh) * | 2015-12-04 | 2016-04-20 | 上海交通大学 | 基于编辑距离的可穿戴设备数据优化处理方法 |
CN106033416A (zh) * | 2015-03-09 | 2016-10-19 | 阿里巴巴集团控股有限公司 | 一种字符串处理方法及装置 |
-
2016
- 2016-12-09 CN CN201611130125.1A patent/CN106650803B/zh active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101826099A (zh) * | 2010-02-04 | 2010-09-08 | 蓝盾信息安全技术股份有限公司 | 一种相似文档识别、文档扩散度确定的方法及系统 |
CN102184169B (zh) * | 2011-04-20 | 2013-06-19 | 北京百度网讯科技有限公司 | 用于确定字符串信息间相似度信息的方法、装置和设备 |
CN102982291A (zh) * | 2012-11-05 | 2013-03-20 | 北京奇虎科技有限公司 | 可信文件数字签名的获取方法及装置 |
CN102955857A (zh) * | 2012-11-09 | 2013-03-06 | 北京航空航天大学 | 一种搜索引擎中基于类中心压缩变换的文本聚类方法 |
CN103207905A (zh) * | 2013-03-28 | 2013-07-17 | 大连理工大学 | 一种基于目标文本的计算文本相似度的方法 |
CN104778171A (zh) * | 2014-01-10 | 2015-07-15 | 携程计算机技术(上海)有限公司 | 字符串匹配系统及方法 |
CN104008166A (zh) * | 2014-05-30 | 2014-08-27 | 华东师范大学 | 一种基于形态和语义相似度的对话短文本聚类方法 |
CN106033416A (zh) * | 2015-03-09 | 2016-10-19 | 阿里巴巴集团控股有限公司 | 一种字符串处理方法及装置 |
CN105512480A (zh) * | 2015-12-04 | 2016-04-20 | 上海交通大学 | 基于编辑距离的可穿戴设备数据优化处理方法 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107273359A (zh) * | 2017-06-20 | 2017-10-20 | 北京四海心通科技有限公司 | 一种文本相似度确定方法 |
CN108681535A (zh) * | 2018-04-11 | 2018-10-19 | 广州视源电子科技股份有限公司 | 候选词评估方法、装置、计算机设备和存储介质 |
CN109165326A (zh) * | 2018-08-16 | 2019-01-08 | 蜜小蜂智慧(北京)科技有限公司 | 一种字符串匹配方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN106650803B (zh) | 2019-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105095204B (zh) | 同义词的获取方法及装置 | |
CN104636466B (zh) | 一种面向开放网页的实体属性抽取方法和系统 | |
Mori et al. | A machine learning approach to recipe text processing | |
CN102693279B (zh) | 一种快速计算评论相似度的方法、装置及系统 | |
CN106844331A (zh) | 一种句子相似度计算方法和系统 | |
CN110502642A (zh) | 一种基于依存句法分析与规则的实体关系抽取方法 | |
CN107180045A (zh) | 一种互联网文本蕴含地理实体关系的抽取方法 | |
Saloot et al. | An architecture for Malay Tweet normalization | |
CN108959630A (zh) | 一种面向英文无结构文本的人物属性抽取方法 | |
JP2014052863A (ja) | 情報処理装置、情報処理システム、情報処理方法 | |
WO2014002774A1 (ja) | 同義語抽出システム、方法および記録媒体 | |
CN109213998A (zh) | 中文错字检测方法及系统 | |
CN106650803A (zh) | 一种计算字符串间相似度的方法及装置 | |
CN102214238A (zh) | 一种汉语词语相近性匹配装置及方法 | |
CN111626042A (zh) | 指代消解方法及装置 | |
Yilahun et al. | Entity extraction based on the combination of information entropy and TF-IDF | |
JP5097802B2 (ja) | ローマ字変換を用いる日本語自動推薦システムおよび方法 | |
Lone et al. | Machine intelligence for language translation from Kashmiri to English | |
Chen et al. | A simple and effective unsupervised word segmentation approach | |
Sun et al. | Syntactic parsing of web queries | |
Ibrahim et al. | Bel-Arabi: advanced Arabic grammar analyzer | |
Khoufi et al. | Statistical-based system for morphological annotation of Arabic texts | |
Hellwig | Morphological disambiguation of classical Sanskrit | |
Abiderexiti et al. | Annotation schemes for constructing Uyghur named entity relation corpus | |
Elsheikh | Timeline of the development of Arabic PoS taggers and Morphological analysers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A method and device for calculating similarity between strings Effective date of registration: 20220105 Granted publication date: 20190618 Pledgee: China Co. truction Bank Corp Beijing Zhongguancun branch Pledgor: RUN TECHNOLOGIES Co.,Ltd. BEIJING Registration number: Y2022990000005 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PC01 | Cancellation of the registration of the contract for pledge of patent right | ||
PC01 | Cancellation of the registration of the contract for pledge of patent right |
Date of cancellation: 20220712 Granted publication date: 20190618 Pledgee: China Co. truction Bank Corp Beijing Zhongguancun branch Pledgor: RUN TECHNOLOGIES Co.,Ltd. BEIJING Registration number: Y2022990000005 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A method and device for calculating similarity between character strings Effective date of registration: 20220907 Granted publication date: 20190618 Pledgee: China Co. truction Bank Corp Beijing Zhongguancun branch Pledgor: RUN TECHNOLOGIES Co.,Ltd. BEIJING Registration number: Y2022110000206 |
|
PC01 | Cancellation of the registration of the contract for pledge of patent right | ||
PC01 | Cancellation of the registration of the contract for pledge of patent right |
Granted publication date: 20190618 Pledgee: China Co. truction Bank Corp Beijing Zhongguancun branch Pledgor: RUN TECHNOLOGIES Co.,Ltd. BEIJING Registration number: Y2022110000206 |