JP5479581B2 - 構造化されていないリソースからの句対のマイニング - Google Patents

構造化されていないリソースからの句対のマイニング Download PDF

Info

Publication number
JP5479581B2
JP5479581B2 JP2012511920A JP2012511920A JP5479581B2 JP 5479581 B2 JP5479581 B2 JP 5479581B2 JP 2012511920 A JP2012511920 A JP 2012511920A JP 2012511920 A JP2012511920 A JP 2012511920A JP 5479581 B2 JP5479581 B2 JP 5479581B2
Authority
JP
Japan
Prior art keywords
result
items
translation model
query
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2012511920A
Other languages
English (en)
Japanese (ja)
Other versions
JP2012527701A5 (OSRAM
JP2012527701A (ja
Inventor
ビー.ドーラン ウィリアム
ジェイ.ブロケット クリストファー
ジェイ.カスティーリョ ジュリオ
エイチ.ヴァンダーヴェンデ ルクレティア
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Corp
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Publication of JP2012527701A publication Critical patent/JP2012527701A/ja
Publication of JP2012527701A5 publication Critical patent/JP2012527701A5/ja
Application granted granted Critical
Publication of JP5479581B2 publication Critical patent/JP5479581B2/ja
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/49Data-driven translation using very large corpora, e.g. the web

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
JP2012511920A 2009-05-22 2010-05-14 構造化されていないリソースからの句対のマイニング Expired - Fee Related JP5479581B2 (ja)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12/470,492 US20100299132A1 (en) 2009-05-22 2009-05-22 Mining phrase pairs from an unstructured resource
US12/470,492 2009-05-22
PCT/US2010/035033 WO2010135204A2 (en) 2009-05-22 2010-05-14 Mining phrase pairs from an unstructured resource

Publications (3)

Publication Number Publication Date
JP2012527701A JP2012527701A (ja) 2012-11-08
JP2012527701A5 JP2012527701A5 (OSRAM) 2013-06-27
JP5479581B2 true JP5479581B2 (ja) 2014-04-23

Family

ID=43125158

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2012511920A Expired - Fee Related JP5479581B2 (ja) 2009-05-22 2010-05-14 構造化されていないリソースからの句対のマイニング

Country Status (8)

Country Link
US (1) US20100299132A1 (OSRAM)
EP (1) EP2433230A4 (OSRAM)
JP (1) JP5479581B2 (OSRAM)
KR (1) KR101683324B1 (OSRAM)
CN (1) CN102439596B (OSRAM)
BR (1) BRPI1011214A2 (OSRAM)
CA (1) CA2758632C (OSRAM)
WO (1) WO2010135204A2 (OSRAM)

Families Citing this family (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110015921A1 (en) * 2009-07-17 2011-01-20 Minerva Advisory Services, Llc System and method for using lingual hierarchy, connotation and weight of authority
US8861844B2 (en) 2010-03-29 2014-10-14 Ebay Inc. Pre-computing digests for image similarity searching of image-based listings in a network-based publication system
US9792638B2 (en) 2010-03-29 2017-10-17 Ebay Inc. Using silhouette images to reduce product selection error in an e-commerce environment
US8412594B2 (en) 2010-08-28 2013-04-02 Ebay Inc. Multilevel silhouettes in an online shopping environment
US9064004B2 (en) * 2011-03-04 2015-06-23 Microsoft Technology Licensing, Llc Extensible surface for consuming information extraction services
CN102789461A (zh) * 2011-05-19 2012-11-21 富士通株式会社 多语词典构建装置和多语词典构建方法
US8909516B2 (en) * 2011-10-27 2014-12-09 Microsoft Corporation Functionality for normalizing linguistic items
US8914371B2 (en) 2011-12-13 2014-12-16 International Business Machines Corporation Event mining in social networks
KR101359718B1 (ko) * 2012-05-17 2014-02-13 포항공과대학교 산학협력단 대화 관리 시스템 및 방법
CN102779186B (zh) * 2012-06-29 2014-12-24 浙江大学 一种非结构化数据管理的全过程建模方法
US9183197B2 (en) 2012-12-14 2015-11-10 Microsoft Technology Licensing, Llc Language processing resources for automated mobile language translation
EP2989596A4 (en) * 2013-04-27 2016-10-05 Datafission Corporaion CONTENT BASED SEARCH ENGINE FOR PROCESSING UNSTRUCTURED DIGITAL DATA
US20140350931A1 (en) * 2013-05-24 2014-11-27 Microsoft Corporation Language model trained using predicted queries from statistical machine translation
EP3084618B1 (en) * 2013-12-19 2021-07-28 Intel Corporation Method and apparatus for communicating between companion devices
US9881006B2 (en) * 2014-02-28 2018-01-30 Paypal, Inc. Methods for automatic generation of parallel corpora
US9740687B2 (en) 2014-06-11 2017-08-22 Facebook, Inc. Classifying languages for objects and entities
US20160012124A1 (en) * 2014-07-10 2016-01-14 Jean-David Ruvini Methods for automatic query translation
CN104462229A (zh) * 2014-11-13 2015-03-25 苏州大学 一种事件分类方法及装置
US9864744B2 (en) * 2014-12-03 2018-01-09 Facebook, Inc. Mining multi-lingual data
US9830386B2 (en) 2014-12-30 2017-11-28 Facebook, Inc. Determining trending topics in social media
US10067936B2 (en) 2014-12-30 2018-09-04 Facebook, Inc. Machine translation output reranking
US9830404B2 (en) 2014-12-30 2017-11-28 Facebook, Inc. Analyzing language dependency structures
US9477652B2 (en) 2015-02-13 2016-10-25 Facebook, Inc. Machine learning dialect identification
US10114817B2 (en) * 2015-06-01 2018-10-30 Microsoft Technology Licensing, Llc Data mining multilingual and contextual cognates from user profiles
US20170024701A1 (en) * 2015-07-23 2017-01-26 Linkedin Corporation Providing recommendations based on job change indications
US9734142B2 (en) 2015-09-22 2017-08-15 Facebook, Inc. Universal translation
US9990361B2 (en) * 2015-10-08 2018-06-05 Facebook, Inc. Language independent representations
US10586168B2 (en) 2015-10-08 2020-03-10 Facebook, Inc. Deep translations
US9747281B2 (en) 2015-12-07 2017-08-29 Linkedin Corporation Generating multi-language social network user profiles by translation
US10133738B2 (en) 2015-12-14 2018-11-20 Facebook, Inc. Translation confidence scores
US9734143B2 (en) 2015-12-17 2017-08-15 Facebook, Inc. Multi-media context language processing
US10002125B2 (en) 2015-12-28 2018-06-19 Facebook, Inc. Language model personalization
US9805029B2 (en) 2015-12-28 2017-10-31 Facebook, Inc. Predicting future translations
US9747283B2 (en) 2015-12-28 2017-08-29 Facebook, Inc. Predicting future translations
US10902215B1 (en) 2016-06-30 2021-01-26 Facebook, Inc. Social hash for language models
US10902221B1 (en) 2016-06-30 2021-01-26 Facebook, Inc. Social hash for language models
CN106960041A (zh) * 2017-03-28 2017-07-18 山西同方知网数字出版技术有限公司 一种基于非平衡数据的知识结构化方法
US10380249B2 (en) 2017-10-02 2019-08-13 Facebook, Inc. Predicting future trending topics
KR102100951B1 (ko) * 2017-11-16 2020-04-14 주식회사 마인즈랩 기계 독해를 위한 질의응답 데이터 생성 시스템
CN110110078B (zh) * 2018-01-11 2024-04-30 北京搜狗科技发展有限公司 数据处理方法和装置、用于数据处理的装置
CN110472251B (zh) * 2018-05-10 2023-05-30 腾讯科技(深圳)有限公司 翻译模型训练的方法、语句翻译的方法、设备及存储介质
CN109033303B (zh) * 2018-07-17 2021-07-02 东南大学 一种基于约简锚点的大规模知识图谱融合方法
CN111971686A (zh) * 2018-12-12 2020-11-20 微软技术许可有限责任公司 自动生成用于对象识别的训练数据集
US11664010B2 (en) 2020-11-03 2023-05-30 Florida Power & Light Company Natural language domain corpus data set creation based on enhanced root utterances
CN113010643B (zh) * 2021-03-22 2023-07-21 平安科技(深圳)有限公司 佛学领域词汇的处理方法、装置、设备及存储介质
US11656881B2 (en) 2021-10-21 2023-05-23 Abbyy Development Inc. Detecting repetitive patterns of user interface actions
US20250077800A1 (en) * 2023-09-06 2025-03-06 7299362 Canada Inc. (O/A Alexa Translations) System and method for rule-based language translation

Family Cites Families (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0856175A4 (en) * 1995-08-16 2000-05-24 Univ Syracuse SYSTEM AND METHOD FOR RETURNING MULTI-LANGUAGE DOCUMENTS USING A SEMANTIC VECTOR COMPARISON
JP3614618B2 (ja) * 1996-07-05 2005-01-26 株式会社日立製作所 文献検索支援方法及び装置およびこれを用いた文献検索サービス
US6076051A (en) * 1997-03-07 2000-06-13 Microsoft Corporation Information retrieval utilizing semantic representation of text
US6266642B1 (en) * 1999-01-29 2001-07-24 Sony Corporation Method and portable apparatus for performing spoken language translation
US6442524B1 (en) * 1999-01-29 2002-08-27 Sony Corporation Analyzing inflectional morphology in a spoken language translation system
US6243669B1 (en) * 1999-01-29 2001-06-05 Sony Corporation Method and apparatus for providing syntactic analysis and data structure for translation knowledge in example-based language translation
US6924828B1 (en) * 1999-04-27 2005-08-02 Surfnotes Method and apparatus for improved information representation
JP2001043236A (ja) * 1999-07-30 2001-02-16 Matsushita Electric Ind Co Ltd 類似語抽出方法、文書検索方法及びこれらに用いる装置
US6757646B2 (en) * 2000-03-22 2004-06-29 Insightful Corporation Extended functionality for an inverse inference engine based web search
US20070027672A1 (en) * 2000-07-31 2007-02-01 Michel Decary Computer method and apparatus for extracting data from web pages
AU2002232928A1 (en) * 2000-11-03 2002-05-15 Zoesis, Inc. Interactive character system
JP2002245070A (ja) * 2001-02-20 2002-08-30 Hitachi Ltd データ表示方法及び装置並びにその処理プログラムを記憶した媒体
US7711547B2 (en) * 2001-03-16 2010-05-04 Meaningful Machines, L.L.C. Word association method and apparatus
US7191115B2 (en) * 2001-06-20 2007-03-13 Microsoft Corporation Statistical method and apparatus for learning translation relationships among words
CN1535433A (zh) * 2001-07-04 2004-10-06 库吉萨姆媒介公司 基于分类的可扩展交互式文档检索系统
US7340388B2 (en) * 2002-03-26 2008-03-04 University Of Southern California Statistical translation using a large monolingual corpus
US7620538B2 (en) * 2002-03-26 2009-11-17 University Of Southern California Constructing a translation lexicon from comparable, non-parallel corpora
US7031911B2 (en) * 2002-06-28 2006-04-18 Microsoft Corporation System and method for automatic detection of collocation mistakes in documents
JP2004252495A (ja) * 2002-09-19 2004-09-09 Advanced Telecommunication Research Institute International 統計的機械翻訳装置をトレーニングするためのトレーニングデータを生成する方法および装置、換言装置、ならびに換言装置をトレーニングする方法及びそのためのデータ処理システムおよびコンピュータプログラム
US7194455B2 (en) * 2002-09-19 2007-03-20 Microsoft Corporation Method and system for retrieving confirming sentences
US7249012B2 (en) * 2002-11-20 2007-07-24 Microsoft Corporation Statistical method and apparatus for learning translation relationships among phrases
EP1588283A2 (en) * 2002-11-22 2005-10-26 Transclick, Inc. System and method for language translation via remote devices
JP2004206517A (ja) * 2002-12-26 2004-07-22 Nifty Corp ホットキーワード提示方法及びホットサイト提示方法
CN1290036C (zh) * 2002-12-30 2006-12-13 国际商业机器公司 根据机器可读词典建立概念知识的计算机系统及方法
US7346487B2 (en) * 2003-07-23 2008-03-18 Microsoft Corporation Method and apparatus for identifying translations
US7584092B2 (en) * 2004-11-15 2009-09-01 Microsoft Corporation Unsupervised learning of paraphrase/translation alternations and selective application thereof
US7412385B2 (en) * 2003-11-12 2008-08-12 Microsoft Corporation System for identifying paraphrases using machine translation
US7698125B2 (en) * 2004-03-15 2010-04-13 Language Weaver, Inc. Training tree transducers for probabilistic operations
US8296127B2 (en) * 2004-03-23 2012-10-23 University Of Southern California Discovery of parallel text portions in comparable collections of corpora and training using comparable texts
US20050216253A1 (en) * 2004-03-25 2005-09-29 Microsoft Corporation System and method for reverse transliteration using statistical alignment
US7593843B2 (en) * 2004-03-30 2009-09-22 Microsoft Corporation Statistical language model for logical form using transfer mappings
US7620539B2 (en) * 2004-07-12 2009-11-17 Xerox Corporation Methods and apparatuses for identifying bilingual lexicons in comparable corpora using geometric processing
US7698124B2 (en) * 2004-11-04 2010-04-13 Microsoft Corporaiton Machine translation system incorporating syntactic dependency treelets into a statistical framework
US7546235B2 (en) * 2004-11-15 2009-06-09 Microsoft Corporation Unsupervised learning of paraphrase/translation alternations and selective application thereof
US7552046B2 (en) * 2004-11-15 2009-06-23 Microsoft Corporation Unsupervised learning of paraphrase/translation alternations and selective application thereof
US20060224579A1 (en) * 2005-03-31 2006-10-05 Microsoft Corporation Data mining techniques for improving search engine relevance
US7813918B2 (en) * 2005-08-03 2010-10-12 Language Weaver, Inc. Identifying documents which form translated pairs, within a document collection
US20070043553A1 (en) * 2005-08-16 2007-02-22 Microsoft Corporation Machine translation models incorporating filtered training data
US8312021B2 (en) * 2005-09-16 2012-11-13 Palo Alto Research Center Incorporated Generalized latent semantic analysis
US7937265B1 (en) * 2005-09-27 2011-05-03 Google Inc. Paraphrase acquisition
US7908132B2 (en) * 2005-09-29 2011-03-15 Microsoft Corporation Writing assistance using machine translation techniques
US8943080B2 (en) * 2006-04-07 2015-01-27 University Of Southern California Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections
US9020804B2 (en) * 2006-05-10 2015-04-28 Xerox Corporation Method for aligning sentences at the word level enforcing selective contiguity constraints
US7949514B2 (en) * 2007-04-20 2011-05-24 Xerox Corporation Method for building parallel corpora
US10460327B2 (en) * 2006-07-28 2019-10-29 Palo Alto Research Center Incorporated Systems and methods for persistent context-aware guides
US20080040339A1 (en) * 2006-08-07 2008-02-14 Microsoft Corporation Learning question paraphrases from log data
GB2444084A (en) * 2006-11-23 2008-05-28 Sharp Kk Selecting examples in an example based machine translation system
US8447589B2 (en) * 2006-12-22 2013-05-21 Nec Corporation Text paraphrasing method and program, conversion rule computing method and program, and text paraphrasing system
US8244521B2 (en) * 2007-01-11 2012-08-14 Microsoft Corporation Paraphrasing the web by search-based data collection
US8332207B2 (en) * 2007-03-26 2012-12-11 Google Inc. Large language models in machine translation
US9002869B2 (en) * 2007-06-22 2015-04-07 Google Inc. Machine translation for query expansion
US7983903B2 (en) * 2007-09-07 2011-07-19 Microsoft Corporation Mining bilingual dictionaries from monolingual web pages
US20090119090A1 (en) * 2007-11-01 2009-05-07 Microsoft Corporation Principled Approach to Paraphrasing
US8209164B2 (en) * 2007-11-21 2012-06-26 University Of Washington Use of lexical translations for facilitating searches
US20090182547A1 (en) * 2008-01-16 2009-07-16 Microsoft Corporation Adaptive Web Mining of Bilingual Lexicon for Query Translation
US8326630B2 (en) * 2008-08-18 2012-12-04 Microsoft Corporation Context based online advertising
US8306806B2 (en) * 2008-12-02 2012-11-06 Microsoft Corporation Adaptive web mining of bilingual lexicon
US8352321B2 (en) * 2008-12-12 2013-01-08 Microsoft Corporation In-text embedded advertising

Also Published As

Publication number Publication date
CN102439596B (zh) 2015-07-22
EP2433230A4 (en) 2017-11-15
WO2010135204A3 (en) 2011-02-17
CN102439596A (zh) 2012-05-02
US20100299132A1 (en) 2010-11-25
KR101683324B1 (ko) 2016-12-06
KR20120026063A (ko) 2012-03-16
CA2758632A1 (en) 2010-11-25
WO2010135204A2 (en) 2010-11-25
CA2758632C (en) 2016-08-30
BRPI1011214A2 (pt) 2016-03-15
JP2012527701A (ja) 2012-11-08
EP2433230A2 (en) 2012-03-28

Similar Documents

Publication Publication Date Title
JP5479581B2 (ja) 構造化されていないリソースからの句対のマイニング
Resnik et al. The web as a parallel corpus
US9633309B2 (en) Displaying quality of question being asked a question answering system
US9727637B2 (en) Retrieving text from a corpus of documents in an information handling system
AU2012235939B2 (en) Real-time automated interpretation of clinical narratives
US9684647B2 (en) Domain-specific computational lexicon formation
US9684714B2 (en) Using paraphrase metrics for answering questions
WO2015179643A1 (en) Systems and methods for generating summaries of documents
Abidin et al. Text stemming and lemmatization of regional languages in Indonesia: a systematic literature review
US20150356181A1 (en) Effectively Ingesting Data Used for Answering Questions in a Question and Answer (QA) System
Bernardini et al. Old needs, new solutions: comparable corpora for language professionals
CN114141384A (zh) 用于检索医学数据的方法、设备和介质
Lango et al. Semi-automatic construction of word-formation networks
Kyjánek et al. Universal derivations kickoff: A collection of harmonized derivational resources for eleven languages
Raja et al. Exploring Edit Distance for Normalising Out-of-Vocabulary Malay Words on Social Media
Gupta et al. Natural language processing algorithms for domain-specific data extraction in material science: Reseractor
Weiss An exploration of pattern mining with ChatGPT
Blancafort et al. TTC Web platform: from corpus compilation to bilingual terminologies for MT and CAT tools
Chaichi et al. Deploying natural language processing to extract key product features of crowdfunding campaigns: the case of 3D printing technologies on kickstarter
Sridhar et al. A Scalable Approach to Building a Parallel Corpus from the Web.
Sheng et al. Coherence and Salience-Based Multi-Document Relationship Mining
Tufiş Finding Translation Examples for Under-Resourced Language Pairs or for Narrow Domains; the Case for Machine Translation
Artetxe et al. Adding syntactic structure to bilingual terminology for improved domain adaptation
Yunus Semantic Query Translation for Quran Content Retrieval
Mohd Amin Semantic query translation for Quran content retrieval/Mohd Amin Mohd Yunus

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20130507

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20130507

RD03 Notification of appointment of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7423

Effective date: 20130712

RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20130719

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20131224

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20140114

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20140212

R150 Certificate of patent or registration of utility model

Ref document number: 5479581

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

S111 Request for change of ownership or part of ownership

Free format text: JAPANESE INTERMEDIATE CODE: R313113

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

LAPS Cancellation because of no payment of annual fees