JP2007025834A5 - - Google Patents

Download PDF

Info

Publication number
JP2007025834A5
JP2007025834A5 JP2005203799A JP2005203799A JP2007025834A5 JP 2007025834 A5 JP2007025834 A5 JP 2007025834A5 JP 2005203799 A JP2005203799 A JP 2005203799A JP 2005203799 A JP2005203799 A JP 2005203799A JP 2007025834 A5 JP2007025834 A5 JP 2007025834A5
Authority
JP
Japan
Prior art keywords
character string
strings
string
text
completeness
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2005203799A
Other languages
Japanese (ja)
Other versions
JP2007025834A (en
JP4661415B2 (en
Filing date
Publication date
Application filed filed Critical
Priority to JP2005203799A priority Critical patent/JP4661415B2/en
Priority claimed from JP2005203799A external-priority patent/JP4661415B2/en
Publication of JP2007025834A publication Critical patent/JP2007025834A/en
Publication of JP2007025834A5 publication Critical patent/JP2007025834A5/ja
Application granted granted Critical
Publication of JP4661415B2 publication Critical patent/JP4661415B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Claims (4)

複数文字列からなるテキストデータを入力する入力手段と、
文字種や単語列のパターンにより定義される文字列分解規則を格納する手段と、
所定の統制用語集を格納する手段と、
前記文字列分解規則に基づき前記統制用語集内の用語をあらかじめ構成素ごとに切断し、切断された部分文字列が含まれ統制用語を候補用語とする部分文字列辞書を作成する手段と、作成された該部分文字列辞書を格納する手段と、
前記テキストデータ中に出現する前記部分文字列辞書の文字列を全て検索するテキスト検索手段
検索された前記部分文字列のうち、共通の前記候補用語を持つ文字列同士を結合し、結合文字列のテキスト上での位置関係と、前記候補用語上での位置関係に基づき完成度を計算する手段、結合文字列の候補用語と完成度を記録する手を備え、
記録された文字列集合の中から、前記完成度の和、及びテキストの被覆率が所定値以上の文字列の集合を選択する手段を備え、選択された前記文字列集合うち、統制用語としての完成度が所定の閾値以上のものを出力手段に出力することを特徴とする表記ゆれ処理システム。
An input means for inputting text data composed of a plurality of character strings;
Means for storing character string decomposition rules defined by character types and word string patterns ;
Means for storing a predetermined controlled glossary;
Means for creating the character string based on a decomposition rule to cut every pre constituents the terms in the control glossary partial character string dictionary to candidate term control term that is part of the said cut portions string Means for storing the created partial character string dictionary;
And text retrieval means for retrieving all the strings of the partial character string dictionary that appear in the text data,
Among the found the partial character string, the calculated binding strings together with a common said candidate term, the positional relationship on the text of the binding string, the complete based on the positional relationship on the candidate term and means for, a manual stage that records the candidate term completeness of coupling string,
From among the recorded sets of strings, the sum of the perfection, and text coverage comprises a means for selecting a set of predetermined value or more strings, said one set of character strings that are selected, as controlled terms A notation fluctuation processing system characterized in that a completeness level equal to or higher than a predetermined threshold is output to an output means .
請求項1に記載の表記ゆれシステムにおいて
前記部分文字列の完成度は、結合される2つの文字列の間に存在するテキスト文字列に基づき、あらかじめ定義された文字列コスト表のコストおよび文字数に従って完成度を計算することを特徴とする表記ゆれシステム。
The notation fluctuation system according to claim 1 ,
The completeness of the partial character string is based on a text character string existing between two character strings to be combined, and the completeness is calculated according to the cost and the number of characters of a pre-defined character string cost table. notation sway system that.
請求項1に記載の表記ゆれシステムにおいて、
前記部分文字列辞書は、数量表現、単位表現等の文字列のパターンもしくは直接文字列を登録した重み付け定義に従った完成度を記録しており、結合される文字列の完成度を、各部分文字列の完成度の和により計算することを特徴とする表記ゆれシステム。
The notation fluctuation system according to claim 1,
The partial character string dictionary records the degree of completion in accordance with a weighting definition in which a character string pattern such as a quantity expression and unit expression or a direct character string is registered. A notation fluctuation system characterized by calculating the sum of completeness of character strings.
請求項1に記載表記ゆれシステムにおいて、
前記テキスト検索手段は、ハミング距離があらかじめ指定した値以下もしくは、編集距離があらかじめ指定した値以下の近似部分文字列を検索することを特徴とする表記ゆれシステム。
In the notation fluctuation system according to claim 1,
It said text search means, the following value Hamming distance previously designated or, notation shaking system and wherein the benzalkonium searches approximate partial string up edit distance is previously specified value.
JP2005203799A 2005-07-13 2005-07-13 Expression fluctuation processing system Expired - Fee Related JP4661415B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2005203799A JP4661415B2 (en) 2005-07-13 2005-07-13 Expression fluctuation processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2005203799A JP4661415B2 (en) 2005-07-13 2005-07-13 Expression fluctuation processing system

Publications (3)

Publication Number Publication Date
JP2007025834A JP2007025834A (en) 2007-02-01
JP2007025834A5 true JP2007025834A5 (en) 2007-12-27
JP4661415B2 JP4661415B2 (en) 2011-03-30

Family

ID=37786539

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2005203799A Expired - Fee Related JP4661415B2 (en) 2005-07-13 2005-07-13 Expression fluctuation processing system

Country Status (1)

Country Link
JP (1) JP4661415B2 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2147385A2 (en) * 2007-04-13 2010-01-27 Koninklijke Philips Electronics N.V. Method and system for determining correlation between clinical events
US9342592B2 (en) 2013-07-29 2016-05-17 Workday, Inc. Method for systematic mass normalization of titles
JP6419899B1 (en) * 2017-06-16 2018-11-07 ソフトバンク株式会社 Information processing apparatus, control method, and control program
CN109885180B (en) 2019-02-21 2022-12-06 北京百度网讯科技有限公司 Error correction method and apparatus, computer readable medium
CN110349639B (en) * 2019-07-12 2022-01-04 之江实验室 Multi-center medical term standardization system based on general medical term library
JP7473314B2 (en) 2019-09-27 2024-04-23 TXP Medical株式会社 Medical information management device and method for adding metadata to medical reports

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06274547A (en) * 1993-03-22 1994-09-30 Nippon Telegr & Teleph Corp <Ntt> Compound word recognizing device
JPH10240743A (en) * 1997-03-03 1998-09-11 Nippon Telegr & Teleph Corp <Ntt> Information storage and retrieval method and system therefor

Similar Documents

Publication Publication Date Title
JP5774751B2 (en) Extracting treelet translation pairs
CN105512291B (en) Method and system for expanding database search queries
CN103189860B (en) Combine the machine translation apparatus and machine translation method of syntax transformation model and vocabulary transformation model
JP2007025834A5 (en)
JP2005115922A5 (en)
CN106202153A (en) The spelling error correction method of a kind of ES search engine and system
CN102549652B (en) Information retrieving apparatus
JP5586817B2 (en) Extracting treelet translation pairs
US20070083369A1 (en) Generating words and names using N-grams of phonemes
DK1952285T3 (en) System and method for crawling and comparing data that has word-like content
WO2003073320A3 (en) Computer representation of a data tree structure and the associated encoding/decoding methods
JP2009238007A (en) Information retrieval device and program
CN110019647A (en) A kind of keyword search methodology, device and search engine
CN103914552B (en) Using search method and device
KR102468481B1 (en) Implication pair expansion device, computer program therefor, and question answering system
US20130066898A1 (en) Matching target strings to known strings
CN103927330A (en) Method and device for determining characters with similar forms in search engine
CA2459182A1 (en) A method for automatically indexing documents
Pennisi Tending the global garden
CN104199954A (en) Recommendation system and method for search input
Simig et al. Open vocabulary extreme classification using generative models
JP2011232943A5 (en) SEARCH DEVICE, SEARCH METHOD, AND COMPUTER PROGRAM
WO2002027466A3 (en) Method for accessing a storage unit during the search for substrings, and a corresponding storage unit
Berring Deconstructing the law library: the wisdom of Meredith Willson
Kamlish et al. Sentimate: Learning to play chess through natural language processing