CN112016306A - 基于词性对齐的文本相似度计算方法 - Google Patents
基于词性对齐的文本相似度计算方法 Download PDFInfo
- Publication number
- CN112016306A CN112016306A CN202010887857.5A CN202010887857A CN112016306A CN 112016306 A CN112016306 A CN 112016306A CN 202010887857 A CN202010887857 A CN 202010887857A CN 112016306 A CN112016306 A CN 112016306A
- Authority
- CN
- China
- Prior art keywords
- speech
- participle
- alignment
- sentence
- participles
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 17
- 230000011218 segmentation Effects 0.000 claims abstract description 18
- 239000011159 matrix material Substances 0.000 claims abstract description 7
- 238000000034 method Methods 0.000 claims description 14
- 238000012549 training Methods 0.000 abstract description 6
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 238000013135 deep learning Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010887857.5A CN112016306B (zh) | 2020-08-28 | 2020-08-28 | 基于词性对齐的文本相似度计算方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010887857.5A CN112016306B (zh) | 2020-08-28 | 2020-08-28 | 基于词性对齐的文本相似度计算方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112016306A true CN112016306A (zh) | 2020-12-01 |
CN112016306B CN112016306B (zh) | 2023-10-20 |
Family
ID=73503917
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010887857.5A Active CN112016306B (zh) | 2020-08-28 | 2020-08-28 | 基于词性对齐的文本相似度计算方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112016306B (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130179169A1 (en) * | 2012-01-11 | 2013-07-11 | National Taiwan Normal University | Chinese text readability assessing system and method |
CN104679728A (zh) * | 2015-02-06 | 2015-06-03 | 中国农业大学 | 一种文本相似度检测方法 |
US20170068665A1 (en) * | 2014-03-07 | 2017-03-09 | National Institute Of Information And Communications Technology | Word alignment score computing apparatus, word alignment apparatus, and computer program |
CN109492213A (zh) * | 2017-09-11 | 2019-03-19 | 阿里巴巴集团控股有限公司 | 句子相似度计算方法和装置 |
CN110348007A (zh) * | 2019-06-14 | 2019-10-18 | 北京奇艺世纪科技有限公司 | 一种文本相似度确定方法及装置 |
-
2020
- 2020-08-28 CN CN202010887857.5A patent/CN112016306B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130179169A1 (en) * | 2012-01-11 | 2013-07-11 | National Taiwan Normal University | Chinese text readability assessing system and method |
US20170068665A1 (en) * | 2014-03-07 | 2017-03-09 | National Institute Of Information And Communications Technology | Word alignment score computing apparatus, word alignment apparatus, and computer program |
CN104679728A (zh) * | 2015-02-06 | 2015-06-03 | 中国农业大学 | 一种文本相似度检测方法 |
CN109492213A (zh) * | 2017-09-11 | 2019-03-19 | 阿里巴巴集团控股有限公司 | 句子相似度计算方法和装置 |
CN110348007A (zh) * | 2019-06-14 | 2019-10-18 | 北京奇艺世纪科技有限公司 | 一种文本相似度确定方法及装置 |
Non-Patent Citations (2)
Title |
---|
夏志明;刘新;: "一种基于语义的中文文本相似度算法", 计算机与现代化, no. 04, pages 6 - 9 * |
尹宝生;杨阳;: "双向词典和语义相似度计算相结合的词对齐算法", 沈阳航空航天大学学报, no. 02, pages 69 - 76 * |
Also Published As
Publication number | Publication date |
---|---|
CN112016306B (zh) | 2023-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230016365A1 (en) | Method and apparatus for training text classification model | |
CN108682417B (zh) | 语音识别中的小数据语音声学建模方法 | |
CN112990296B (zh) | 基于正交相似度蒸馏的图文匹配模型压缩与加速方法及系统 | |
WO2020143163A1 (zh) | 基于注意力机制的命名实体识别方法、装置和计算机设备 | |
CN109902307A (zh) | 命名实体识别方法、命名实体识别模型的训练方法及装置 | |
CN111062217B (zh) | 语言信息的处理方法、装置、存储介质及电子设备 | |
CN101295295A (zh) | 基于线性模型的汉语词法分析方法 | |
CN112686040B (zh) | 一种基于图循环神经网络的事件事实性检测方法 | |
EP4394759A1 (en) | Artificial intelligence-based audio processing method and apparatus, electronic device, computer program product, and computer-readable storage medium | |
CN110705253A (zh) | 基于迁移学习的缅甸语依存句法分析方法及装置 | |
CN111489746A (zh) | 一种基于bert的电网调度语音识别语言模型构建方法 | |
CN113672731A (zh) | 基于领域信息的情感分析方法、装置、设备及存储介质 | |
CN113657098A (zh) | 文本纠错方法、装置、设备及存储介质 | |
WO2023045186A1 (zh) | 意图识别方法、装置、电子设备和存储介质 | |
CN114398900A (zh) | 一种基于RoBERTa模型的长文本语义相似度计算方法 | |
CN112183060B (zh) | 多轮对话系统的指代消解方法 | |
CN112016306B (zh) | 基于词性对齐的文本相似度计算方法 | |
Xie et al. | L2 mispronunciation verification based on acoustic phone embedding and siamese networks | |
Stahlberg et al. | Word segmentation and pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment | |
CN110826329A (zh) | 一种基于困惑度的自动作文评分方法 | |
CN115525749A (zh) | 语音问答方法、装置、电子设备和存储介质 | |
Yousif | Neural computing based part of speech tagger for Arabic language: a review study | |
WO2022251720A1 (en) | Character-level attention neural networks | |
CN116090449A (zh) | 一种质量问题分析报告的实体关系抽取方法及系统 | |
CN115310432A (zh) | 一种错别字检测及纠正方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240318 Address after: No. 88-1101 Zhongshan Road, Gejiu City, Honghe Hani and Yi Autonomous Prefecture, Yunnan Province, 661099 Patentee after: Gejiu City Radio and Television Information Network Technology Co.,Ltd. Country or region after: China Address before: Room B2-4, 3rd Floor, Building 11, Internet Industrial Park, No. 106 Jinkai Avenue West Section, Yubei District, Chongqing, 400000 Patentee before: CHONGQING XIEZHI TECHNOLOGY Co.,Ltd. Country or region before: China |
|
TR01 | Transfer of patent right |