JP4986919B2 - タグ付きデータを有する完全形式レキシコンおよびタグ付きデータを構成し使用する方法 - Google Patents
タグ付きデータを有する完全形式レキシコンおよびタグ付きデータを構成し使用する方法 Download PDFInfo
- Publication number
- JP4986919B2 JP4986919B2 JP2008117038A JP2008117038A JP4986919B2 JP 4986919 B2 JP4986919 B2 JP 4986919B2 JP 2008117038 A JP2008117038 A JP 2008117038A JP 2008117038 A JP2008117038 A JP 2008117038A JP 4986919 B2 JP4986919 B2 JP 4986919B2
- Authority
- JP
- Japan
- Prior art keywords
- word
- information
- lexicon
- storing
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000011218 segmentation Effects 0.000 claims abstract description 46
- 238000012545 processing Methods 0.000 claims abstract description 24
- 150000001875 compounds Chemical class 0.000 claims abstract description 23
- 239000000470 constituent Substances 0.000 claims description 18
- 230000003068 static effect Effects 0.000 claims description 14
- 230000001131 transforming effect Effects 0.000 claims description 2
- 241000282472 Canis lupus familiaris Species 0.000 description 13
- 238000004891 communication Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000003058 natural language processing Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 230000002093 peripheral effect Effects 0.000 description 5
- 238000007781 pre-processing Methods 0.000 description 5
- 238000013519 translation Methods 0.000 description 4
- 238000005065 mining Methods 0.000 description 3
- 238000012937 correction Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000005055 memory storage Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- CDFKCKUONRRKJD-UHFFFAOYSA-N 1-(3-chlorophenoxy)-3-[2-[[3-(3-chlorophenoxy)-2-hydroxypropyl]amino]ethylamino]propan-2-ol;methanesulfonic acid Chemical compound CS(O)(=O)=O.CS(O)(=O)=O.C=1C=CC(Cl)=CC=1OCC(O)CNCCNCC(O)COC1=CC=CC(Cl)=C1 CDFKCKUONRRKJD-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 235000019692 hotdogs Nutrition 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/268—Morphological analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Calculators And Similar Devices (AREA)
Description
130 システムメモリ
134 オペレーティングシステム
135 アプリケーションプログラム
136 他のプログラムモジュール
137 プログラムデータ
140 固定型不揮発性メモリインタフェース
144 オペレーティングシステム
145 アプリケーションプログラム
146 他のプログラムモジュール
147 プログラムデータ
150 取外し可能不揮発性メモリインタフェース
160 ユーザ入力インタフェース
161 指示装置
162 キーボード
163 マイクロホン
170 ネットワークインタフェース
171 ローカルエリアネットワーク
172 モデム
173 ワイドエリアネットワーク
180 リモートコンピュータ
185 リモートアプリケーションプログラム
190 映像インタフェース
191 モニタ
195 出力周辺インタフェース
196 プリンタ
197 スピーカ
202 プロセッサ
204 メモリ
208 通信インタフェース
214 アプリケーション(群)
216 オブジェクトストア
302 入力
304 出力
306 テキストアナライザ
308 レキシコン
Claims (3)
- 言語処理に使用するためのレキシコンをコンピュータによって構築する方法であって、
前記コンピュータが受け取った入力テキストから抽出した単語毎に、
前記入力テキストに含まれる前記単語のスペル情報を生成し、前記レキシコンに格納するステップと、
前記単語が少なくとも2つの別個の構成単語で構成される複合語であるとき、選択された言語における有効な複合語を認識する際に使用される動的セグメンテーション情報であって、前記複合語における前記少なくとも2つの別個の構成単語の相対的な位置を示す動的セグメンテーション情報を生成し、前記レキシコンに格納するステップと、
前記単語の品詞を表す品詞情報を生成し、前記レキシコンに格納するステップであって、前記単語が複数の品詞を有するとき、前記品詞情報は前記複数の品詞を表すステップと、
前記単語を第2の単語に変形するための見出し語デルタ情報を生成し、前記レキシコンに格納するステップであって、前記見出し語デルタ情報は、前記単語を前記第2の単語に変形するために前記単語に対して実行するオペレーションを示すオペレーションコードと、前記オペレーションを実行すべき前記単語内の文字を指示する引数値とを含むステップと、
前記入力テキストに含まれる前記単語に関する記述情報であって、前記単語の文法的特徴、または前記単語が名前を表す用語の一部であるかどうかを記述する記述情報を生成し、前記レキシコンに格納するステップと、
前記入力テキストに含まれる前記単語が前記複合語である場合に、前記複合語を構成する各構成単語の長さを示す値を含む静的セグメンテーションマスク情報を生成し、前記静的セグメンテーションマスク情報を前記レキシコンに格納するステップと
を実行することによって前記レキシコンを構築し、該構築されたレキシコンは、前記コンピュータからアクセス可能な、前記コンピュータの記憶装置に格納され、前記コンピュータが1つまたは複数の単語を含む新たな入力から出力を生成するための言語処理を実行するのに使用可能であることを特徴とする方法。 - 前記入力テキスト内の複数の品詞を有する単語について、各単語が個々の品詞として用いられる相対的な確率を示す確率情報を格納する中間索引テーブルを生成するステップをさらに含むことを特徴とする請求項1に記載の方法。
- 前記コンピュータが新たな入力テキストを受け取ったとき、該新たな入力テキストに含まれる単語から、現時点で前記レキシコンにない新しい単語を選択するステップと、
前記選択した単語の各々に対して、前記スペル情報生成し、格納するステップと、前記動的セグメンテーションを生成し、格納するステップと、前記品詞情報を生成し、格納するステップと、前記見出し語デルタ情報を生成し、格納するステップと、前記記述情報を生成し、格納するステップと、前記静的セグメンテーションマスク情報を生成し、格納するステップとを実行して、前記レキシコンを更新するステップと
をさらに含むことを特徴とする請求項1に記載の方法。
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US51392103P | 2003-10-23 | 2003-10-23 | |
US60/513,921 | 2003-10-23 | ||
US10/804,998 | 2004-03-19 | ||
US10/804,998 US7421386B2 (en) | 2003-10-23 | 2004-03-19 | Full-form lexicon with tagged data and methods of constructing and using the same |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2004275875A Division JP4676181B2 (ja) | 2003-10-23 | 2004-09-22 | タグ付きデータを有する完全形式レキシコンおよびタグ付きデータを構成し使用する方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2008234680A JP2008234680A (ja) | 2008-10-02 |
JP4986919B2 true JP4986919B2 (ja) | 2012-07-25 |
Family
ID=34396615
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2004275875A Expired - Fee Related JP4676181B2 (ja) | 2003-10-23 | 2004-09-22 | タグ付きデータを有する完全形式レキシコンおよびタグ付きデータを構成し使用する方法 |
JP2008117038A Expired - Fee Related JP4986919B2 (ja) | 2003-10-23 | 2008-04-28 | タグ付きデータを有する完全形式レキシコンおよびタグ付きデータを構成し使用する方法 |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2004275875A Expired - Fee Related JP4676181B2 (ja) | 2003-10-23 | 2004-09-22 | タグ付きデータを有する完全形式レキシコンおよびタグ付きデータを構成し使用する方法 |
Country Status (7)
Country | Link |
---|---|
US (1) | US7421386B2 (ja) |
EP (1) | EP1526464B1 (ja) |
JP (2) | JP4676181B2 (ja) |
KR (1) | KR101130384B1 (ja) |
CN (1) | CN1670728A (ja) |
AT (1) | ATE401609T1 (ja) |
DE (1) | DE602004015039D1 (ja) |
Families Citing this family (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7447627B2 (en) * | 2003-10-23 | 2008-11-04 | Microsoft Corporation | Compound word breaker and spell checker |
US7421386B2 (en) | 2003-10-23 | 2008-09-02 | Microsoft Corporation | Full-form lexicon with tagged data and methods of constructing and using the same |
US7398210B2 (en) * | 2003-10-23 | 2008-07-08 | Microsoft Corporation | System and method for performing analysis on word variants |
JP4754247B2 (ja) * | 2004-03-31 | 2011-08-24 | オセ−テクノロジーズ ビーブイ | 複合語を構成する単語を割り出す装置及びコンピュータ化された方法 |
US7409334B1 (en) * | 2004-07-22 | 2008-08-05 | The United States Of America As Represented By The Director, National Security Agency | Method of text processing |
KR100682897B1 (ko) * | 2004-11-09 | 2007-02-15 | 삼성전자주식회사 | 사전 업데이트 방법 및 그 장치 |
US7869989B1 (en) * | 2005-01-28 | 2011-01-11 | Artificial Cognition Inc. | Methods and apparatus for understanding machine vocabulary |
US20070078644A1 (en) * | 2005-09-30 | 2007-04-05 | Microsoft Corporation | Detecting segmentation errors in an annotated corpus |
US7624099B2 (en) * | 2005-10-13 | 2009-11-24 | Microsoft Corporation | Client-server word-breaking framework |
JP4671898B2 (ja) * | 2006-03-30 | 2011-04-20 | 富士通株式会社 | 音声認識装置、音声認識方法、音声認識プログラム |
US8024173B1 (en) | 2006-09-11 | 2011-09-20 | WordRake Holdings, LLC | Computer processes for detecting and correcting writing problems associated with nominalizations |
JP4446313B2 (ja) * | 2006-12-15 | 2010-04-07 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 音声処理用の辞書に登録するべき新規語句を検索する技術 |
JP2008287406A (ja) * | 2007-05-16 | 2008-11-27 | Sony Corp | 情報処理装置および情報処理方法、プログラム、並びに、記録媒体 |
US20080294982A1 (en) * | 2007-05-21 | 2008-11-27 | Microsoft Corporation | Providing relevant text auto-completions |
CN100483416C (zh) * | 2007-05-22 | 2009-04-29 | 北京搜狗科技发展有限公司 | 一种字符输入的方法、输入法系统及词库更新的方法 |
JP5241828B2 (ja) * | 2007-06-14 | 2013-07-17 | グーグル・インコーポレーテッド | 辞書の単語及び熟語の判定 |
US8527262B2 (en) * | 2007-06-22 | 2013-09-03 | International Business Machines Corporation | Systems and methods for automatic semantic role labeling of high morphological text for natural language processing applications |
US8630841B2 (en) | 2007-06-29 | 2014-01-14 | Microsoft Corporation | Regular expression word verification |
US7912703B2 (en) * | 2007-12-10 | 2011-03-22 | International Business Machines Corporation | Unsupervised stemming schema learning and lexicon acquisition from corpora |
US8521516B2 (en) * | 2008-03-26 | 2013-08-27 | Google Inc. | Linguistic key normalization |
US8706477B1 (en) | 2008-04-25 | 2014-04-22 | Softwin Srl Romania | Systems and methods for lexical correspondence linguistic knowledge base creation comprising dependency trees with procedural nodes denoting execute code |
US20100169768A1 (en) * | 2008-12-31 | 2010-07-01 | International Business Machines Corporation | Spell Checker That Teaches Rules of Spelling |
US20100228538A1 (en) * | 2009-03-03 | 2010-09-09 | Yamada John A | Computational linguistic systems and methods |
US8762130B1 (en) | 2009-06-17 | 2014-06-24 | Softwin Srl Romania | Systems and methods for natural language processing including morphological analysis, lemmatizing, spell checking and grammar checking |
US8762131B1 (en) | 2009-06-17 | 2014-06-24 | Softwin Srl Romania | Systems and methods for managing a complex lexicon comprising multiword expressions and multiword inflection templates |
US20100332217A1 (en) * | 2009-06-29 | 2010-12-30 | Shalom Wintner | Method for text improvement via linguistic abstractions |
EP2534585A4 (en) * | 2010-02-12 | 2018-01-24 | Google LLC | Compound splitting |
US9378202B2 (en) * | 2010-03-26 | 2016-06-28 | Virtuoz Sa | Semantic clustering |
KR20120089502A (ko) * | 2010-12-09 | 2012-08-13 | 한국전자통신연구원 | 번역지식 서버 생성 방법 및 그 장치 |
US8533724B1 (en) | 2010-12-20 | 2013-09-10 | Amazon Technologies, Inc. | Virtual resource provisioning by assigning colors to virtual resources in multi-tenant resource pool |
JP2012198277A (ja) * | 2011-03-18 | 2012-10-18 | Toshiba Corp | 文書読み上げ支援装置、文書読み上げ支援方法および文書読み上げ支援プログラム |
US8868766B1 (en) | 2011-03-29 | 2014-10-21 | Amazon Technologies, Inc. | Optimizing communication among collections of computing resources |
US8775438B1 (en) * | 2011-09-22 | 2014-07-08 | Amazon Technologies, Inc. | Inferring resource allocation decisions from descriptive information |
EP2788894A4 (en) * | 2011-12-05 | 2015-11-11 | Nexalogy Environics Inc | SYSTEM AND METHOD FOR PERFORMING AN ANALYSIS ON INFORMATION SUCH AS SOCIAL MEDIA |
US20130166282A1 (en) * | 2011-12-21 | 2013-06-27 | Federated Media Publishing, Llc | Method and apparatus for rating documents and authors |
US9208134B2 (en) * | 2012-01-10 | 2015-12-08 | King Abdulaziz City For Science And Technology | Methods and systems for tokenizing multilingual textual documents |
CN103678301B (zh) * | 2012-08-30 | 2017-02-08 | 英业达科技有限公司 | 高级查询并新增翻译内容的翻译查询系统及其方法 |
US9953646B2 (en) | 2014-09-02 | 2018-04-24 | Belleau Technologies | Method and system for dynamic speech recognition and tracking of prewritten script |
US10409909B2 (en) * | 2014-12-12 | 2019-09-10 | Omni Ai, Inc. | Lexical analyzer for a neuro-linguistic behavior recognition system |
US10409910B2 (en) | 2014-12-12 | 2019-09-10 | Omni Ai, Inc. | Perceptual associative memory for a neuro-linguistic behavior recognition system |
US9678941B2 (en) | 2014-12-23 | 2017-06-13 | International Business Machines Corporation | Domain-specific computational lexicon formation |
DE102015017039A1 (de) * | 2015-02-17 | 2016-08-11 | Nikolaus Castell | Automatische Analyse von geschriebenen und gesprochenen Texten einer Ausgangssprache (hier dargestellt am Beispiel der deutschen Sprache) bezueglich ihrer sprach-grammatischen Stimmigkeit, was eine verbesserte Ausgangssituation fuer Uebersetzungen in andere Sprachen darstellt |
CN105161095B (zh) * | 2015-07-29 | 2017-03-22 | 百度在线网络技术(北京)有限公司 | 语音识别语法树的构图方法及装置 |
US10733224B2 (en) * | 2017-02-07 | 2020-08-04 | International Business Machines Corporation | Automatic corpus selection and halting condition detection for semantic asset expansion |
US10445423B2 (en) | 2017-08-17 | 2019-10-15 | International Business Machines Corporation | Domain-specific lexically-driven pre-parser |
US10769375B2 (en) | 2017-08-17 | 2020-09-08 | International Business Machines Corporation | Domain-specific lexical analysis |
US11010553B2 (en) * | 2018-04-18 | 2021-05-18 | International Business Machines Corporation | Recommending authors to expand personal lexicon |
US11580301B2 (en) * | 2019-01-08 | 2023-02-14 | Genpact Luxembourg S.à r.l. II | Method and system for hybrid entity recognition |
CN109670188A (zh) * | 2019-01-23 | 2019-04-23 | 北京超试科技有限公司 | 数据处理方法及装置 |
CN112447168A (zh) * | 2019-09-05 | 2021-03-05 | 阿里巴巴集团控股有限公司 | 语音识别系统、方法、音箱、显示设备和交互平台 |
CN112037770B (zh) * | 2020-08-03 | 2023-12-29 | 北京捷通华声科技股份有限公司 | 发音词典的生成方法、单词语音识别的方法和装置 |
CN111894582B (zh) * | 2020-08-04 | 2021-09-24 | 中国矿业大学 | 一种采煤机控制方法 |
CN112417900B (zh) * | 2020-11-25 | 2024-08-09 | 北京乐我无限科技有限责任公司 | 一种翻译方法、装置、电子设备及计算机可读存储介质 |
US11791838B2 (en) * | 2021-01-15 | 2023-10-17 | Samsung Electronics Co., Ltd. | Near-storage acceleration of dictionary decoding |
CN115358189B (zh) * | 2022-08-18 | 2024-10-08 | 中国电信股份有限公司 | 文本编码方法、装置、介质及设备 |
Family Cites Families (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4384329A (en) * | 1980-12-19 | 1983-05-17 | International Business Machines Corporation | Retrieval of related linked linguistic expressions including synonyms and antonyms |
US4724523A (en) * | 1985-07-01 | 1988-02-09 | Houghton Mifflin Company | Method and apparatus for the electronic storage and retrieval of expressions and linguistic information |
JPS608980A (ja) * | 1983-06-28 | 1985-01-17 | Brother Ind Ltd | 電子辞書 |
US4736296A (en) * | 1983-12-26 | 1988-04-05 | Hitachi, Ltd. | Method and apparatus of intelligent guidance in natural language |
JPS60245083A (ja) * | 1984-05-18 | 1985-12-04 | Brother Ind Ltd | 電子辞書 |
JPS6126176A (ja) * | 1984-07-17 | 1986-02-05 | Nec Corp | 言語処理用辞書 |
JPS6165361A (ja) * | 1984-09-05 | 1986-04-03 | Sharp Corp | 電子式仏単語辞書 |
US4701851A (en) * | 1984-10-24 | 1987-10-20 | International Business Machines Corporation | Compound word spelling verification |
US4672571A (en) * | 1984-10-24 | 1987-06-09 | International Business Machines Corporation | Compound word suitability for spelling verification |
US4771385A (en) * | 1984-11-21 | 1988-09-13 | Nec Corporation | Word recognition processing time reduction system using word length and hash technique involving head letters |
US4969097A (en) * | 1985-09-18 | 1990-11-06 | Levin Leonid D | Method of rapid entering of text into computer equipment |
US4887212A (en) * | 1986-10-29 | 1989-12-12 | International Business Machines Corporation | Parser for natural language text |
US4868750A (en) * | 1987-10-07 | 1989-09-19 | Houghton Mifflin Company | Collocational grammar system |
US5056021A (en) * | 1989-06-08 | 1991-10-08 | Carolyn Ausborn | Method and apparatus for abstracting concepts from natural language |
JPH03161727A (ja) * | 1989-11-20 | 1991-07-11 | Fuji Photo Film Co Ltd | カメラの主要被写体検出装置 |
US5708829A (en) * | 1991-02-01 | 1998-01-13 | Wang Laboratories, Inc. | Text indexing system |
JPH0581313A (ja) * | 1991-09-20 | 1993-04-02 | Kobe Nippon Denki Software Kk | 辞書作成装置 |
JP2875075B2 (ja) | 1991-10-30 | 1999-03-24 | シャープ株式会社 | 電子辞書 |
JP2897191B2 (ja) * | 1992-05-20 | 1999-05-31 | 株式会社シーエスケイ | 日本語形態素解析システム及び形態素解析方式 |
US5867812A (en) * | 1992-08-14 | 1999-02-02 | Fujitsu Limited | Registration apparatus for compound-word dictionary |
US6760695B1 (en) * | 1992-08-31 | 2004-07-06 | Logovista Corporation | Automated natural language processing |
US6278967B1 (en) * | 1992-08-31 | 2001-08-21 | Logovista Corporation | Automated system for generating natural language translations that are domain-specific, grammar rule-based, and/or based on part-of-speech analysis |
JPH0756957A (ja) * | 1993-08-03 | 1995-03-03 | Xerox Corp | ユーザへの情報提供方法 |
US5611076A (en) * | 1994-09-21 | 1997-03-11 | Micro Data Base Systems, Inc. | Multi-model database management system engine for databases having complex data models |
US5799268A (en) * | 1994-09-28 | 1998-08-25 | Apple Computer, Inc. | Method for extracting knowledge from online documentation and creating a glossary, index, help database or the like |
US5715468A (en) * | 1994-09-30 | 1998-02-03 | Budzinski; Robert Lucius | Memory system for storing and retrieving experience and knowledge with natural language |
JP3003915B2 (ja) * | 1994-12-26 | 2000-01-31 | シャープ株式会社 | 単語辞書検索装置 |
JPH08323842A (ja) * | 1995-06-02 | 1996-12-10 | Tsutsunaka Plast Ind Co Ltd | シートブロー成形品と成形方法 |
US5995922A (en) * | 1996-05-02 | 1999-11-30 | Microsoft Corporation | Identifying information related to an input word in an electronic dictionary |
US5864863A (en) * | 1996-08-09 | 1999-01-26 | Digital Equipment Corporation | Method for parsing, indexing and searching world-wide-web pages |
JP2001505330A (ja) * | 1996-08-22 | 2001-04-17 | ルノー・アンド・オスピー・スピーチ・プロダクツ・ナームローゼ・ベンノートシャープ | テキストストリーム中の単語の切れ目を与える方法及び装置 |
US6081774A (en) * | 1997-08-22 | 2000-06-27 | Novell, Inc. | Natural language information retrieval system and method |
US5995992A (en) * | 1997-11-17 | 1999-11-30 | Bull Hn Information Systems Inc. | Conditional truncation indicator control for a decimal numeric processor employing result truncation |
AU2953499A (en) | 1998-03-27 | 1999-10-18 | Lernout & Hauspie Speech Products N.V. | Speech recognition dictionary enlargement using derived words |
US6424983B1 (en) * | 1998-05-26 | 2002-07-23 | Global Information Research And Technologies, Llc | Spelling and grammar checking system |
US6233553B1 (en) * | 1998-09-04 | 2001-05-15 | Matsushita Electric Industrial Co., Ltd. | Method and system for automatically determining phonetic transcriptions associated with spelled words |
US6298321B1 (en) * | 1998-11-23 | 2001-10-02 | Microsoft Corporation | Trie compression using substates and utilizing pointers to replace or merge identical, reordered states |
US6278968B1 (en) * | 1999-01-29 | 2001-08-21 | Sony Corporation | Method and apparatus for adaptive speech recognition hypothesis construction and selection in a spoken language translation system |
US6675169B1 (en) * | 1999-09-07 | 2004-01-06 | Microsoft Corporation | Method and system for attaching information to words of a trie |
US6393389B1 (en) * | 1999-09-23 | 2002-05-21 | Xerox Corporation | Using ranked translation choices to obtain sequences indicating meaning of multi-token expressions |
JP3717730B2 (ja) * | 1999-11-02 | 2005-11-16 | セイコーインスツル株式会社 | 電子辞書 |
US6792418B1 (en) * | 2000-03-29 | 2004-09-14 | International Business Machines Corporation | File or database manager systems based on a fractal hierarchical index structure |
US6965858B2 (en) * | 2000-04-03 | 2005-11-15 | Xerox Corporation | Method and apparatus for reducing the intermediate alphabet occurring between cascaded finite state transducers |
JP2002132763A (ja) * | 2000-10-26 | 2002-05-10 | Sharp Corp | 原文書き換え装置及び原文書き換え方法 |
GB0120862D0 (en) * | 2001-08-29 | 2001-10-17 | United Wire Ltd | Method and device for joining screens |
US7089188B2 (en) * | 2002-03-27 | 2006-08-08 | Hewlett-Packard Development Company, L.P. | Method to expand inputs for word or document searching |
US7490034B2 (en) * | 2002-04-30 | 2009-02-10 | Microsoft Corporation | Lexicon with sectionalized data and method of using the same |
US7680649B2 (en) * | 2002-06-17 | 2010-03-16 | International Business Machines Corporation | System, method, program product, and networking use for recognizing words and their parts of speech in one or more natural languages |
US7447627B2 (en) * | 2003-10-23 | 2008-11-04 | Microsoft Corporation | Compound word breaker and spell checker |
US7398210B2 (en) * | 2003-10-23 | 2008-07-08 | Microsoft Corporation | System and method for performing analysis on word variants |
US7421386B2 (en) | 2003-10-23 | 2008-09-02 | Microsoft Corporation | Full-form lexicon with tagged data and methods of constructing and using the same |
-
2004
- 2004-03-19 US US10/804,998 patent/US7421386B2/en not_active Expired - Fee Related
- 2004-09-02 AT AT04020874T patent/ATE401609T1/de not_active IP Right Cessation
- 2004-09-02 EP EP04020874A patent/EP1526464B1/en not_active Expired - Lifetime
- 2004-09-02 DE DE602004015039T patent/DE602004015039D1/de not_active Expired - Lifetime
- 2004-09-03 KR KR1020040070523A patent/KR101130384B1/ko active IP Right Grant
- 2004-09-22 JP JP2004275875A patent/JP4676181B2/ja not_active Expired - Fee Related
- 2004-10-25 CN CNA2004100877109A patent/CN1670728A/zh active Pending
-
2008
- 2008-04-28 JP JP2008117038A patent/JP4986919B2/ja not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
KR20050039540A (ko) | 2005-04-29 |
US7421386B2 (en) | 2008-09-02 |
US20050091031A1 (en) | 2005-04-28 |
JP2008234680A (ja) | 2008-10-02 |
JP2005129030A (ja) | 2005-05-19 |
EP1526464B1 (en) | 2008-07-16 |
JP4676181B2 (ja) | 2011-04-27 |
EP1526464A1 (en) | 2005-04-27 |
DE602004015039D1 (de) | 2008-08-28 |
CN1670728A (zh) | 2005-09-21 |
ATE401609T1 (de) | 2008-08-15 |
KR101130384B1 (ko) | 2012-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4986919B2 (ja) | タグ付きデータを有する完全形式レキシコンおよびタグ付きデータを構成し使用する方法 | |
Palmer | Tokenisation and sentence segmentation | |
US5890103A (en) | Method and apparatus for improved tokenization of natural language text | |
US6539348B1 (en) | Systems and methods for parsing a natural language sentence | |
US7739104B2 (en) | System and method for natural language processing and using ontological searches | |
US20070011132A1 (en) | Named entity translation | |
Freeman et al. | Cross linguistic name matching in English and Arabic | |
JP2012248210A (ja) | 日本語などの複雑言語のコンテンツを検索するシステム及び方法 | |
CN102439590A (zh) | 用于自然语言文本的自动语义标注的系统和方法 | |
JP2008547093A (ja) | モノリンガルコーポラおよび使用可能なバイリンガルコーポラからのコロケーション翻訳 | |
KR20050007547A (ko) | 단어 인식 방법 및 시스템 및 컴퓨터 프로그램 메모리저장 디바이스 | |
WO2008145055A1 (fr) | Procédé pour obtenir une information de mot de restriction et pour optimiser le système du procédé d'entrée et de sortie | |
US7398210B2 (en) | System and method for performing analysis on word variants | |
US6968308B1 (en) | Method for segmenting non-segmented text using syntactic parse | |
KR20040101678A (ko) | 복합 형태소 분석 장치 및 방법 | |
EP0316743B1 (en) | Method for removing enclitic endings from verbs in romance languages | |
Lehmann et al. | BNCweb | |
Wu et al. | Parsing-based Chinese word segmentation integrating morphological and syntactic information | |
Mohri et al. | Probabilistic context-free grammar induction based on structural zeros | |
KR100487716B1 (ko) | 단어레벨의 통계적 방법을 이용한 번역문 생성 방법 및 그장치 | |
JP2005063030A (ja) | 概念表現方法、概念表現生成方法及び概念表現生成装置並びに該方法を実現するプログラム及び該プログラムが記録された記録媒体 | |
Chaware et al. | Rule-based phonetic matching approach for Hindi and Marathi | |
KR100474359B1 (ko) | 키워드 기반 N-gram 언어모델 구축 방법 | |
Daoud | Morphological analysis and diacritical Arabic text compression | |
JP2009009583A (ja) | 構文パースを用いてセグメント化されていないテキストをセグメント化する方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20101203 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20110302 |
|
A02 | Decision of refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A02 Effective date: 20110922 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20120123 |
|
RD13 | Notification of appointment of power of sub attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7433 Effective date: 20120124 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A821 Effective date: 20120124 |
|
A911 | Transfer to examiner for re-examination before appeal (zenchi) |
Free format text: JAPANESE INTERMEDIATE CODE: A911 Effective date: 20120213 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20120417 |
|
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20120424 |
|
R150 | Certificate of patent or registration of utility model |
Free format text: JAPANESE INTERMEDIATE CODE: R150 Ref document number: 4986919 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20150511 Year of fee payment: 3 |
|
S111 | Request for change of ownership or part of ownership |
Free format text: JAPANESE INTERMEDIATE CODE: R313113 |
|
R350 | Written notification of registration of transfer |
Free format text: JAPANESE INTERMEDIATE CODE: R350 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
LAPS | Cancellation because of no payment of annual fees |