CN110210028B - 针对语音转译文本的领域特征词提取方法、装置、设备及介质 - Google Patents
针对语音转译文本的领域特征词提取方法、装置、设备及介质 Download PDFInfo
- Publication number
- CN110210028B CN110210028B CN201910466124.1A CN201910466124A CN110210028B CN 110210028 B CN110210028 B CN 110210028B CN 201910466124 A CN201910466124 A CN 201910466124A CN 110210028 B CN110210028 B CN 110210028B
- Authority
- CN
- China
- Prior art keywords
- word
- words
- domain feature
- value
- voice translation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013519 translation Methods 0.000 title claims abstract description 101
- 238000000034 method Methods 0.000 title claims description 37
- 238000012937 correction Methods 0.000 claims abstract description 52
- 238000000605 extraction Methods 0.000 claims abstract description 39
- 238000012545 processing Methods 0.000 claims abstract description 29
- 239000002131 composite material Substances 0.000 claims abstract description 16
- 230000011218 segmentation Effects 0.000 claims description 46
- 150000001875 compounds Chemical class 0.000 claims description 21
- 238000001914 filtration Methods 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 6
- 238000003672 processing method Methods 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 4
- 238000005259 measurement Methods 0.000 claims description 3
- 239000012634 fragment Substances 0.000 description 8
- 230000000694 effects Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Acoustics & Sound (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910466124.1A CN110210028B (zh) | 2019-05-30 | 2019-05-30 | 针对语音转译文本的领域特征词提取方法、装置、设备及介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910466124.1A CN110210028B (zh) | 2019-05-30 | 2019-05-30 | 针对语音转译文本的领域特征词提取方法、装置、设备及介质 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110210028A CN110210028A (zh) | 2019-09-06 |
CN110210028B true CN110210028B (zh) | 2023-04-28 |
Family
ID=67789670
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910466124.1A Active CN110210028B (zh) | 2019-05-30 | 2019-05-30 | 针对语音转译文本的领域特征词提取方法、装置、设备及介质 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110210028B (zh) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717021B (zh) * | 2019-09-17 | 2023-08-29 | 平安科技(深圳)有限公司 | 人工智能面试中获取输入文本和相关装置 |
CN111078979A (zh) * | 2019-11-29 | 2020-04-28 | 上海观安信息技术股份有限公司 | 一种基于ocr和文本处理技术识别网贷网站的方法及系统 |
CN111160013B (zh) * | 2019-12-30 | 2023-11-24 | 北京百度网讯科技有限公司 | 文本纠错方法及装置 |
CN111460170B (zh) * | 2020-03-27 | 2024-02-13 | 深圳价值在线信息科技股份有限公司 | 一种词语识别方法、装置、终端设备及存储介质 |
CN111985234B (zh) * | 2020-09-08 | 2022-02-01 | 四川长虹电器股份有限公司 | 语音文本纠错方法 |
CN113486680B (zh) * | 2021-07-23 | 2023-12-15 | 平安科技(深圳)有限公司 | 文本翻译方法、装置、设备及存储介质 |
CN113591440B (zh) * | 2021-07-29 | 2023-08-01 | 百度在线网络技术(北京)有限公司 | 一种文本处理方法、装置及电子设备 |
CN114822527A (zh) * | 2021-10-11 | 2022-07-29 | 北京中电慧声科技有限公司 | 一种语音转文本的纠错方法、装置及电子设备和存储介质 |
CN114330336A (zh) * | 2021-11-19 | 2022-04-12 | 福建亿榕信息技术有限公司 | 一种基于左右信息熵和互信息的新词发现方法以及装置 |
CN114912437B (zh) * | 2022-04-29 | 2024-07-19 | 上海交通大学 | 弹幕颜文字检测与提取方法、系统、终端及介质 |
CN117763153B (zh) * | 2024-02-22 | 2024-04-30 | 大汉软件股份有限公司 | 一种专题语料发现新词的方法及系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008014702A1 (fr) * | 2006-07-25 | 2008-02-07 | Beijing Sogou Technology Development Co., Ltd. | Procédé et système d'extraction de mots nouveaux |
CN106528532A (zh) * | 2016-11-07 | 2017-03-22 | 上海智臻智能网络科技股份有限公司 | 文本纠错方法、装置及终端 |
CN107608963A (zh) * | 2017-09-12 | 2018-01-19 | 马上消费金融股份有限公司 | 一种基于互信息的中文纠错方法、装置、设备及存储介质 |
CN108804512A (zh) * | 2018-04-20 | 2018-11-13 | 平安科技(深圳)有限公司 | 文本分类模型的生成装置、方法及计算机可读存储介质 |
CN108804617A (zh) * | 2018-05-30 | 2018-11-13 | 广州杰赛科技股份有限公司 | 领域术语抽取方法、装置、终端设备及存储介质 |
-
2019
- 2019-05-30 CN CN201910466124.1A patent/CN110210028B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008014702A1 (fr) * | 2006-07-25 | 2008-02-07 | Beijing Sogou Technology Development Co., Ltd. | Procédé et système d'extraction de mots nouveaux |
CN106528532A (zh) * | 2016-11-07 | 2017-03-22 | 上海智臻智能网络科技股份有限公司 | 文本纠错方法、装置及终端 |
CN107608963A (zh) * | 2017-09-12 | 2018-01-19 | 马上消费金融股份有限公司 | 一种基于互信息的中文纠错方法、装置、设备及存储介质 |
CN108804512A (zh) * | 2018-04-20 | 2018-11-13 | 平安科技(深圳)有限公司 | 文本分类模型的生成装置、方法及计算机可读存储介质 |
CN108804617A (zh) * | 2018-05-30 | 2018-11-13 | 广州杰赛科技股份有限公司 | 领域术语抽取方法、装置、终端设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110210028A (zh) | 2019-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110210028B (zh) | 针对语音转译文本的领域特征词提取方法、装置、设备及介质 | |
CN112560912B (zh) | 分类模型的训练方法、装置、电子设备和存储介质 | |
CN109241524B (zh) | 语义解析方法及装置、计算机可读存储介质、电子设备 | |
CN108363790B (zh) | 用于对评论进行评估的方法、装置、设备和存储介质 | |
CN109960724B (zh) | 一种基于tf-idf的文本摘要方法 | |
CN107301170B (zh) | 基于人工智能的切分语句的方法和装置 | |
CN111125349A (zh) | 基于词频和语义的图模型文本摘要生成方法 | |
CN111967262A (zh) | 实体标签的确定方法和装置 | |
CN104899190B (zh) | 分词词典的生成方法和装置及分词处理方法和装置 | |
CN113053367B (zh) | 语音识别方法、语音识别的模型训练方法以及装置 | |
CN106528532A (zh) | 文本纠错方法、装置及终端 | |
CN106897439A (zh) | 文本的情感识别方法、装置、服务器以及存储介质 | |
CN104111925B (zh) | 项目推荐方法和装置 | |
CN110717340B (zh) | 推荐方法、装置、电子设备及存储介质 | |
CN106445915B (zh) | 一种新词发现方法及装置 | |
CN108304377B (zh) | 一种长尾词的提取方法及相关装置 | |
CN111177375B (zh) | 一种电子文档分类方法及装置 | |
CN112528653B (zh) | 短文本实体识别方法和系统 | |
CN110245361B (zh) | 短语对提取方法、装置、电子设备及可读存储介质 | |
CN111325033A (zh) | 实体识别方法、装置、电子设备及计算机可读存储介质 | |
CN109298796B (zh) | 一种词联想方法及装置 | |
CN110874408A (zh) | 模型训练方法、文本识别方法、装置及计算设备 | |
CN116932736A (zh) | 一种基于用户需求结合倒排表的专利推荐方法 | |
CN109871540A (zh) | 一种文本相似度的计算方法以及相关设备 | |
CN114595684A (zh) | 一种摘要生成方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Ji Wang Inventor after: Chen Mo Inventor after: Cheng Wei Inventor after: Qiu Xiaxia Inventor after: Qian Yan Inventor before: Ji Wang Inventor before: Chen Mo Inventor before: Cheng Wei Inventor before: Qiu Xiaxia Inventor before: Qian Yan |
|
CB03 | Change of inventor or designer information | ||
CB02 | Change of applicant information |
Address after: 23 / F, World Trade Center, 857 Xincheng Road, Binjiang District, Hangzhou City, Zhejiang Province, 310051 Applicant after: Hangzhou Yuanchuan Xinye Technology Co.,Ltd. Address before: 23 / F, World Trade Center, 857 Xincheng Road, Binjiang District, Hangzhou City, Zhejiang Province, 310051 Applicant before: Hangzhou Yuanchuan New Technology Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |