CN101836205A - 域词典创建 - Google Patents

域词典创建 Download PDF

Info

Publication number
CN101836205A
CN101836205A CN200880112723A CN200880112723A CN101836205A CN 101836205 A CN101836205 A CN 101836205A CN 200880112723 A CN200880112723 A CN 200880112723A CN 200880112723 A CN200880112723 A CN 200880112723A CN 101836205 A CN101836205 A CN 101836205A
Authority
CN
China
Prior art keywords
corpus
word
speech
candidate
descriptor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200880112723A
Other languages
English (en)
Chinese (zh)
Inventor
吴军
唐溪柳
洪锋
王咏刚
杨波
张蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/844,067 external-priority patent/US7983902B2/en
Priority claimed from US11/844,153 external-priority patent/US7917355B2/en
Application filed by Google LLC filed Critical Google LLC
Publication of CN101836205A publication Critical patent/CN101836205A/zh
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)
CN200880112723A 2007-08-23 2008-08-25 域词典创建 Pending CN101836205A (zh)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US11/844,153 2007-08-23
US11/844,067 US7983902B2 (en) 2007-08-23 2007-08-23 Domain dictionary creation by detection of new topic words using divergence value comparison
US11/844,153 US7917355B2 (en) 2007-08-23 2007-08-23 Word detection
US11/844,067 2007-08-23
PCT/CN2008/072128 WO2009026850A1 (en) 2007-08-23 2008-08-25 Domain dictionary creation

Publications (1)

Publication Number Publication Date
CN101836205A true CN101836205A (zh) 2010-09-15

Family

ID=40386710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200880112723A Pending CN101836205A (zh) 2007-08-23 2008-08-25 域词典创建

Country Status (3)

Country Link
JP (1) JP5379138B2 (https=)
CN (1) CN101836205A (https=)
WO (1) WO2009026850A1 (https=)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411563A (zh) * 2010-09-26 2012-04-11 阿里巴巴集团控股有限公司 一种识别目标词的方法、装置及系统
CN103970730A (zh) * 2014-04-29 2014-08-06 河海大学 一种从单个中文文本中提取多主题词的方法
CN107045871A (zh) * 2016-02-05 2017-08-15 谷歌公司 利用外部数据源重新识别语音
CN107704102A (zh) * 2017-10-09 2018-02-16 北京新美互通科技有限公司 一种文本输入方法及装置
CN108027822A (zh) * 2015-04-21 2018-05-11 里德爱思唯尔股份有限公司雷克萨斯尼克萨斯分公司 用于从文档语料库中生成概念的系统和方法
CN108170294A (zh) * 2013-08-08 2018-06-15 阿里巴巴集团控股有限公司 词汇显示、字段转换方法及客户端、电子设备和计算机存储介质
CN110347931A (zh) * 2013-06-06 2019-10-18 腾讯科技(深圳)有限公司 文章新章节的检测方法及装置

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011120211A1 (en) 2010-03-29 2011-10-06 Nokia Corporation Method and apparatus for seeded user interest modeling
CN102236639B (zh) * 2010-04-28 2016-08-10 三星电子株式会社 更新语言模型的系统和方法
US9069798B2 (en) * 2012-05-24 2015-06-30 Mitsubishi Electric Research Laboratories, Inc. Method of text classification using discriminative topic transformation
CN105956359B (zh) * 2016-04-15 2018-06-05 陈杰 一种用于异构系统的药品项目名称对照转译方法
CN106682128A (zh) * 2016-12-13 2017-05-17 成都数联铭品科技有限公司 多领域词典自动构建方法
CN113780007B (zh) * 2021-10-22 2025-01-21 平安科技(深圳)有限公司 语料筛选方法、意图识别模型优化方法、设备及存储介质
CN115858787B (zh) * 2022-12-12 2023-08-01 交通运输部公路科学研究所 一种基于公路运输中问题诉求信息的热点提取和挖掘方法
CN116911321B (zh) * 2023-06-21 2024-05-14 三峡高科信息技术有限责任公司 一种前端自动翻译字典值的方法及组件

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2883153B2 (ja) * 1990-04-02 1999-04-19 株式会社リコー キーワード抽出装置
US6167368A (en) * 1998-08-14 2000-12-26 The Trustees Of Columbia University In The City Of New York Method and system for indentifying significant topics of a document
US6651058B1 (en) * 1999-11-15 2003-11-18 International Business Machines Corporation System and method of automatic discovery of terms in a document that are relevant to a given target topic
GB2399427A (en) * 2003-03-12 2004-09-15 Canon Kk Apparatus for and method of summarising text
JP4254623B2 (ja) * 2004-06-09 2009-04-15 日本電気株式会社 トピック分析方法及びその装置並びにプログラム
JP5259919B2 (ja) * 2005-07-21 2013-08-07 ダイキン工業株式会社 軸流ファン
US7813919B2 (en) * 2005-12-20 2010-10-12 Xerox Corporation Class description generation for clustering and categorization

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411563A (zh) * 2010-09-26 2012-04-11 阿里巴巴集团控股有限公司 一种识别目标词的方法、装置及系统
CN110347931A (zh) * 2013-06-06 2019-10-18 腾讯科技(深圳)有限公司 文章新章节的检测方法及装置
CN108170294A (zh) * 2013-08-08 2018-06-15 阿里巴巴集团控股有限公司 词汇显示、字段转换方法及客户端、电子设备和计算机存储介质
CN103970730A (zh) * 2014-04-29 2014-08-06 河海大学 一种从单个中文文本中提取多主题词的方法
CN108027822A (zh) * 2015-04-21 2018-05-11 里德爱思唯尔股份有限公司雷克萨斯尼克萨斯分公司 用于从文档语料库中生成概念的系统和方法
CN107045871A (zh) * 2016-02-05 2017-08-15 谷歌公司 利用外部数据源重新识别语音
CN107045871B (zh) * 2016-02-05 2020-09-15 谷歌有限责任公司 利用外部数据源重新识别语音
CN107704102A (zh) * 2017-10-09 2018-02-16 北京新美互通科技有限公司 一种文本输入方法及装置
CN107704102B (zh) * 2017-10-09 2021-08-03 北京新美互通科技有限公司 一种文本输入方法及装置

Also Published As

Publication number Publication date
JP2010537286A (ja) 2010-12-02
JP5379138B2 (ja) 2013-12-25
WO2009026850A1 (en) 2009-03-05

Similar Documents

Publication Publication Date Title
US8386240B2 (en) Domain dictionary creation by detection of new topic words using divergence value comparison
JP5379138B2 (ja) 領域辞書の作成
KR101465770B1 (ko) 단어 확률 결정
US7917355B2 (en) Word detection
CN102272754B (zh) 定制语言模型
CN101779200B (zh) 词典词和短语确定方法和设备
CN102124459B (zh) 词典词和短语确定
US8688727B1 (en) Generating query refinements
CN102439540B (zh) 输入法编辑器
CN111324771A (zh) 视频标签的确定方法、装置、电子设备及存储介质
CN110162771B (zh) 事件触发词的识别方法、装置、电子设备
Golpar-Rabooki et al. Feature extraction in opinion mining through Persian reviews
Sharma et al. Word prediction system for text entry in Hindi
Hemmer et al. Estimating post-ocr denoising complexity on numerical texts
Pratama et al. A comparison of the use of several different resources on lexicon based Indonesian sentiment analysis on app review dataset
Rezai et al. FarsiTag: A part-of-speech tagging system for Persian
CN112949287A (zh) 热词挖掘方法、系统、计算机设备和存储介质
Mirzababaei et al. Discriminative reranking for context-sensitive spell–checker
Bhuyan et al. Context-Based Clustering of Assamese Words using N-gram Model
Bhatia et al. Predictive and corrective text input for desktop editor using n-grams and suffix trees
Pantel et al. Smart selection
Krug et al. Detecting character references in literary novels using a two stage contextual deep learning approach
Duan et al. Error Checking for Chinese Query by Mining Web Log

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20100915