JP2009223463A - Synonymy determination apparatus, method therefor, program, and recording medium - Google Patents

Synonymy determination apparatus, method therefor, program, and recording medium Download PDF

Info

Publication number
JP2009223463A
JP2009223463A JP2008065256A JP2008065256A JP2009223463A JP 2009223463 A JP2009223463 A JP 2009223463A JP 2008065256 A JP2008065256 A JP 2008065256A JP 2008065256 A JP2008065256 A JP 2008065256A JP 2009223463 A JP2009223463 A JP 2009223463A
Authority
JP
Japan
Prior art keywords
synonym
complement
reading
notation
syllable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2008065256A
Other languages
Japanese (ja)
Other versions
JP5094486B2 (en
Inventor
Izumi Takahashi
いづみ 高橋
Hisako Asano
久子 浅野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2008065256A priority Critical patent/JP5094486B2/en
Publication of JP2009223463A publication Critical patent/JP2009223463A/en
Application granted granted Critical
Publication of JP5094486B2 publication Critical patent/JP5094486B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To highly accurately determine synonymy of character string representation accompanied by addition of a typical character string, notation transformation with reading thereof stored, abbreviation, or the like. <P>SOLUTION: With a synonym candidate pair generating section 1, input text is analyzed and processed, synonym candidate representation is extracted from the text based on a result of the analysis and the corresponding analysis result is added thereto, and after normalizing notation and reading of the synonym candidate representation by using an inverse transformation rule 3 and a syllable normalization rule 4, respective synonym candidate representations are combined to generate a synonym candidate pair composed of a pair of the synonym candidate representations. With a synonymy determination section 2, a syllable similarity table 5 and an abbreviation determination model 6 are used to determine whether the synonym candidate representations in the synonym candidate pair is synonymous, and outputs the synonym candidate pair as a synonym pair if they are synonymous. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、テキストに含まれる文字列表現の同義性を判定する技術、詳細にはテキストから同義語侯補としての文字列表現である同義語侯補表現を抽出して一対の同義語侯補表現よりなる同義語侯補ペアを生成し、当該同義語侯補ペア中の同義語侯補表現同士が同義か否か(同一の情報を指すか否か)を判定する技術に関する。   The present invention relates to a technique for determining the synonymity of a character string expression included in a text, and more specifically, a synonym complement expression that is a character string expression as a synonym complement is extracted from the text and a pair of synonym complements is extracted. The present invention relates to a technique for generating a synonym complement pair composed of expressions and determining whether or not synonym complement expressions in the synonym complement pair are synonymous (whether or not they indicate the same information).

同義性判定技術は、テキストエディタにおけるスペルミス検出などに用いられる他、同技術により得られた同義語を集約することで同義語辞書を作成し、その辞書を検索装置に組み込んでクエリ拡張に用いる等の利用法がある。   The synonym determination technique is used for detecting spelling errors in a text editor, creating a synonym dictionary by aggregating synonyms obtained by the technique, incorporating the dictionary into a search device, and using it for query expansion, etc. There is a usage.

なお、明細書及び図面中に登場する「PlayStation」、「プレイステーション」、「プレーステーション」は登録商標であり、また、「アップル社」、「ペヨンジュン」、「木村拓哉」、「安藤美姫」、「安めぐみ」、「ハリーポッター」は著名な企業や人物、キャラクタ等の名称(氏名)であるが、本願が言語処理の発明であり、「(登録商標)」の文字を挿入したり、その表記を変更すると意味が変わってしまうため、そのまま記載するものとした。   “PlayStation”, “PlayStation”, and “PlayStation” appearing in the specification and drawings are registered trademarks, and “Apple Inc.”, “Payeonjung”, “Takuya Kimura”, “Miki Ando”, “ “Megumi Yasu” and “Harry Potter” are names (names) of prominent companies, people, characters, etc., but this application is an invention of language processing, and the characters “(registered trademark)” are inserted, The meaning will change if the notation is changed.

言語を用いて任意の1つの事物・事象を表現しようとする場合、多彩な表現を選択することが可能なため、当該言語を表す文字列からなるテキスト中には同一の情報が複数の異なる表現(文字列表現)で存在する。また、テキストの量やそれを作成する人数が増えるほど、1つの情報に対する表現のバリエーションは増加する。そのためテキスト中の同一の情報を漏れなく集めるためには、2つの文字列表現同士が同一の情報を指しているか否かを判定する、同義性判定手法が必要となる。「同一の情報を指す文字列表現」の粒度は文書単位から単語単位まで様々なものが考えられるが、本発明における同義性判定は、名詞及び複合名詞単位で行うものを対象とする。   When expressing any one thing / event using a language, it is possible to select a variety of expressions, so the same information in the text consisting of a character string representing the language has multiple different expressions It exists in (character string representation). In addition, as the amount of text and the number of people who create it increase, the variation of expression for one piece of information increases. Therefore, in order to collect the same information in the text without omission, a synonymity determination method for determining whether or not two character string expressions indicate the same information is necessary. The granularity of the “character string expression indicating the same information” may vary from the document unit to the word unit, but the synonymity determination in the present invention is performed on the basis of nouns and compound noun units.

前述した名詞及び複合名詞単位の粒度において同義性判定を行う従来の手法としては大きく分けて2つあり、1つは識別手法、もう1つは生成手法である。識別手法は、任意のテキストから同義語侯補表現を抽出して同義語侯補ペアを生成し、同義か否かを判定する方法である(例えば、非特許文献1参照)。生成手法は、ある文字列表現の同義語侯補として考えられる表現を全て生成する方法であり、生成後にWebなどを用いて実在を確認する場合もある(例えば、非特許文献2参照)。   There are roughly two conventional methods for performing synonymity determination in the above-mentioned noun and compound noun unit granularity, one is an identification method, and the other is a generation method. The identification method is a method of extracting a synonym complement expression from an arbitrary text to generate a synonym complement pair and determining whether or not they are synonymous (for example, see Non-Patent Document 1). The generation method is a method of generating all expressions that can be considered as synonyms for a certain character string expression. In some cases, the existence is confirmed using Web or the like after generation (see, for example, Non-Patent Document 2).

両手法とも同義語侯補ペアを作り、正解へと絞り込む(同義性判定を行う)という順序で行われ、獲得可能な同義語のカバー範囲はペア生成の手法に、精度はペア生成と絞込みの両手法に依存する。始めに文字種や略語、表記ゆれなど、より多様性のある同義語候補のペアを生成すればカバー範囲は広くなるが、精度は低くなる。精度を高くするには、ペア生成の時点で同義語の種類に制限を加え、より確からしい同義語侯補ペアに絞り込んで収集するか、絞込み(同義性判定)の手法として、多様性のある同義語に対しても高精度で判定が行える手法を採用する必要がある。   Both methods create synonym complement pairs and narrow them down to the correct answer (perform synonym determination). The range of synonyms that can be acquired is the pair generation method, and the accuracy is pair generation and refinement. Depends on both methods. First, if a variety of synonym candidate pairs such as character types, abbreviations, and notation fluctuations are generated, the coverage is widened, but the accuracy is lowered. To improve accuracy, limit the types of synonyms at the time of pair generation, narrow down the collection to more probable synonym complement pairs, or use a variety of narrowing (synonymity judgment) techniques. It is necessary to adopt a technique that can determine with high accuracy even for synonyms.

識別手法における同義語侯補の収集方法としては、表構造、タグや記号、特殊な表現(○○こと××)などのメタ情報を用いるもの(非特許文献3)と、表記の類似を利用して略語やカタカナ異表記など特定の種類に限定し、その特徴とのパターンマッチにより収集するもの(非特許文献1)がある。両方法ともペアの生成時にメタ情報や表記の種類などの制約を設けることで一定の精度を担保している。   As a method of collecting synonym complements in the identification method, use the similarity of notation with the one using meta information such as table structure, tags and symbols, special expressions (XX) XX) Then, there are some which are limited to specific types such as abbreviations and katakana notation and are collected by pattern matching with the features (Non-patent Document 1). In both methods, certain accuracy is ensured by providing constraints such as the type of meta information and notation when generating pairs.

生成手法は、あるテキスト表現の同義語侯補として考えられる表現を全て生成する手法であり、生成後にWebなどを用いて実在を確認する場合もある(例えば非特許文献2)。生成手法ではヒューリスティックなルールや確率モデルなどで同義語侯補ペアの生成を行うため、生成できる同義語侯補は略語やカタカナ異表記等の特定の種類に限定される。
酒井浩之、増山繁「コーパスからの名詞と略語の対応関係の自動獲得」言語処理学会第9回年次大会発表論文集、2003年、pp.226〜229 村山紀文、奥山学「Noisy−channel modelを用いた略語自動推定」言語処理学会第12回年次大会発表論文集、2006年、pp.763〜766 関恒仁、嶋田和孝、遠藤勉「表の構造を利用した類義語抽出」言語処理学会第11回年次大会発表論文集、2005年、C1−6
The generation method is a method of generating all expressions that can be considered as synonyms for a certain text expression, and the existence may be confirmed using the Web or the like after generation (for example, Non-Patent Document 2). In the generation method, synonym complement pairs are generated using heuristic rules, probability models, and the like, and the synonym complements that can be generated are limited to specific types such as abbreviations and katakana different notations.
Hiroyuki Sakai, Shigeru Masuyama “Automatic Acquisition of Correspondence between Nouns and Abbreviations from Corpus” Proceedings of the 9th Annual Conference of the Language Processing Society of Japan, 2003, pp. 226-229 Norifumi Murayama, Manabu Okuyama “Abbreviations Automatic Estimation Using Noisy-channel model” Proceedings of the 12th Annual Conference of the Language Processing Society of Japan, 2006, pp. 763-766 Tsunehito Seki, Kazutaka Shimada, Tsutomu Endo “Synonym Extraction Using Table Structure” Proceedings of the 11th Annual Conference of the Language Processing Society of Japan, 2005, C1-6

従来の識別手法において、メタ情報を利用して同義語侯補ペアを収集する手法は、特殊な記述方法で書かれた表記以外は同義語侯補ペアとして利用できず、それ以外の手法でも特定の種類(略語やカタカナ異表記)に特化しているため獲得できる範囲がその種類内に限られてしまい、カバー範囲が狭いという問題があった。   In the conventional identification method, the method of collecting synonym complement pairs using meta information cannot be used as synonym complement pairs except for the notation written in a special description method, and it can also be specified by other methods There is a problem that the range that can be acquired is limited to the type because it is specialized in the type (abbreviation and katakana different notation), and the cover range is narrow.

また、生成手法においては、前述したように同義語侯補ペアをヒューリスティックなルールにより生成する場合と、確率モデルを用いて生成する場合があるが、前者ではコストが高く、カバー範囲も狭いという問題があり、後者では極端に精度が低いという問題があった。   In addition, in the generation method, as described above, there are cases where a synonym complement pair is generated using a heuristic rule and a probability model, but the former is expensive and has a narrow coverage. In the latter case, there was a problem that the accuracy was extremely low.

同義語には表記ゆれのように表記が類似しているほど同義である可能性が高い場合と、省略語のように表記の類似度だけでは同義性が計れない場合、さらにその両方の性質を備えている場合が混在しており、従来の手法のカバー範囲が狭いのは、手法を特定の同義語の種類に特化せざるをえないことが原因であった。   If synonyms are more likely to be synonymous as the notation is similar, such as fluctuations in notation, and if synonyms cannot be measured only by the similarity of the notation, such as abbreviations, then both of these properties are further The reason why the coverage of the conventional method is narrow is that the method has to be specialized for a specific type of synonym.

しかし、様々な種類がある同義語も発生過程に注目すると、(a)定型的な文字列の追加、(b)読みを保存しての表記変換、(c)省略化、という3つの主な原因に絞ることができる。その3種類が個別に起こる場合、そして同時に起こる場合があることにより同義語の多様性が増している(a,b,c,a+b,a+c,b+c,a+b+cの7パターン)。   However, paying attention to the generation process of synonyms with various types, there are three main types: (a) addition of a regular character string, (b) notation conversion by storing readings, and (c) omission. The cause can be narrowed down. The diversity of synonyms is increased when the three types occur individually and sometimes simultaneously (seven patterns of a, b, c, a + b, a + c, b + c, and a + b + c).

(a)で追加される文字列は、「ちゃん」や「ティ」などの接辞表現や特定の記号など定型的な表現である。よって追加された定型的な文字列を削除すれば生成前の表記と同じになる。(b)は「PlayStation」を「プレイステーション」「プレーステーション」のように、読みを保存したまま表記を変換するため、同義語両方に読みを付与すると全く同じ、または非常に類似した読みとなる。(c)は「国際連合」を「国連」のように文字順を保存したまま文字を削除するため、長いものが短いものを包含する関係にあり、どの文字が削除されるかにはある程度法則性がある。   The character string added in (a) is a fixed expression such as an affix expression such as “Chan” or “Tee” or a specific symbol. Therefore, if the added standard character string is deleted, it becomes the same as the notation before generation. In (b), “PlayStation” is converted to “PlayStation” and “PlayStation”, and the notation is converted while the readings are stored. Therefore, if readings are given to both synonyms, the readings are the same or very similar. In (c), characters are deleted while preserving the order of characters as in “United Nations” as in “United Nations”, so long characters include short ones, and there are some rules regarding which characters are deleted. There is sex.

そこでまず、(a)で追加された定型的な文字列を削除して表記の正規化を行い、(b)で変換された表記に読みを付与して正規化を行って、その結果、表記または読みが同じになるか、非常に類似していれば同義と判定する。そして(a)と(b)で生じるゆれを吸収した後に(c)の省略が起こったか否かを判定する、という順序で判定を行えば、多様性のある同義語を全て判定することができ、カバー範囲を広げることが可能になる。   Therefore, first, the standard character string added in (a) is deleted to normalize the notation, and the notation converted in (b) is given a normalization, resulting in the notation. Or, if the readings are the same or very similar, it is determined to be synonymous. And if it is determined in the order of determining whether omission of (c) has occurred after absorbing the fluctuations generated in (a) and (b), all synonyms with diversity can be determined. It becomes possible to widen the cover range.

本発明は以上の問題を鑑みてなされたもので、同義語侯補収集時には1テキスト(少なくとも1つの文を含む1まとまりの文章)内の名詞総当りで同義語の種類に関係なく同義語侯補ペアを生成し、生成した多様性のある同義語侯補ペアの同義性判定可能な範囲をほぼ全種類にまで広げるため、それぞれを表記と読みの両方から正規化を行う。   The present invention has been made in view of the above problems. At the time of collecting synonym supplements, the synonym で is used regardless of the type of synonyms in all the nouns in one text (a group of sentences including at least one sentence). In order to expand the range in which synonyms can be determined to almost all kinds of generated synonym-complement pairs, normalization is performed from both notation and reading.

同義語侯補ペア生成時には、名詞総当りで同義語侯補ペアを生成するため、テキストの記述方式に依存せず同義語侯補ペアが生成できる。また、1テキスト内という制約によってテキストを跨いで存在する、表記は類似しているが無関係な同義語侯補ペアを生成されるのを防ぐ。   When generating synonym complement pairs, synonym complement pairs are generated for all nouns, so synonym complement pairs can be generated without depending on the text description method. In addition, it prevents the generation of a synonym complement pair that exists across the text due to the restriction within one text but has similar notation but is irrelevant.

同義性判定時には生成した同義語侯補ペアそれぞれを表記と読み両方から正規化を行うとともに、同義語侯補ペアがどの種類の同義語か条件判定し、同義語侯補ペアを種類ごとに分離してから判定を行うことで、同義語侯補ペアの種類に適した同義性判定手法を適用可能にし、精度を向上させることを可能とした。   At the time of synonym determination, each generated synonym complement pair is normalized from both notation and reading, and the synonym complement pair is conditionally determined and the synonym complement pair is separated by type. Then, by performing the determination, it is possible to apply a synonym determination method suitable for the type of the synonym complement pair and to improve accuracy.

また、本発明の同義性判定手法は、ほぼ全ての同義語の種類(略語やカタカナ異表記)に対応可能であるため、同義語侯補収集方法がどのような手法であっても同義性判定が可能で、既存の同義語侯補収集手法と組み合わせて使用することも可能である。   In addition, since the synonym determination method of the present invention is compatible with almost all types of synonyms (abbreviations and katakana different notations), synonym determination is possible regardless of the synonym complement collection method. It can also be used in combination with existing synonym complement collection techniques.

本発明は、テキストを入力すると、そこに含まれる名詞及び複合名詞から同義侯補ペアを生成し、ペアの単語それぞれの表記と読みを正規化し、その過程で全く同じ表記または非常に類似した読みとなったものは同義と判定し、同義と判定されなかったもので包含関係にあるものは分類器を用いて同義語かどうか判定を行うことを特徴とする。   When text is input, synonymous complement pairs are generated from the nouns and compound nouns contained in the text, and the notation and reading of each word in the pair are normalized. In the process, the same notation or very similar reading is used. What is determined to be synonymous, and what is not determined to be synonymous and is in an inclusive relationship is determined using a classifier to determine whether it is a synonym.

本発明によれば、(a)定型的な文字列の追加、または(b)読みを保存しての表記変換、あるいは(c)省略化等を伴う、多様性のある文字列表現の同義性を精度高く判定することができる。   According to the present invention, synonymity of diverse character string expressions, including (a) addition of a regular character string, (b) notational conversion by storing readings, or (c) omission, etc. Can be determined with high accuracy.

以下、本発明を図示の実施の形態により詳細に説明する。   Hereinafter, the present invention will be described in detail with reference to the illustrated embodiments.

図1は本発明の同義性判定装置の実施の形態の一例を示すもので、本同義性判定装置は、同義語候補ペア生成部1、同義性判定部2、逆変換ルール記憶部3、音節正規化ルール記憶部4、音節類似度テーブル記憶部5、省略判定モデル記憶部6、解析処理結果テーブル記憶部7及び同義語候補ペアリスト記憶部8からなる。   FIG. 1 shows an example of an embodiment of a synonym determination device according to the present invention. The synonym determination device includes a synonym candidate pair generation unit 1, a synonym determination unit 2, an inverse conversion rule storage unit 3, and a syllable. It consists of a normalization rule storage unit 4, a syllable similarity table storage unit 5, an omission determination model storage unit 6, an analysis processing result table storage unit 7, and a synonym candidate pair list storage unit 8.

同義語侯補生成部1は、図示しないキーボード等から直接入力され又は記憶媒体から読み出されて入力され又は通信媒体を介して他の装置等から入力されたテキスト、ここでは1テキスト(少なくとも1つの文を含む1まとまりの文章に対応するテキストデータ)を処理単位として周知の形態素解析や固有表現抽出などの解析処理を行い、その解析結果に基づいて前記テキストから同義語侯補表現を抽出し、当該同義語侯補表現とともにこれに対応する前記解析結果を解析処理結果テーブル記憶部7に記憶し、さらに各同義語侯補表現に対して逆変換ルール記憶部3に記憶された逆変換ルールを用いて表記の正規化を行い、音節正規化ルール記憶部4に記憶された音節正規化ルールを用いて読みの正規化を行い、それらの結果を解析処理結果テーブル記憶部7に記憶する。その後、解析処理結果テーブル記憶部7に記憶された同義語侯補表現同士を総当たりで組み合わせて同義語侯補ペアを作成し、同義語候補ペアリスト記憶部8に記憶する。   The synonym complement generating unit 1 is a text input directly from a keyboard or the like (not shown), or read from a storage medium or input from another device or the like via a communication medium, in this case, one text (at least one text) Text data corresponding to a set of sentences including one sentence) is used as a processing unit, and analysis processing such as well-known morphological analysis and proper expression extraction is performed, and synonym complement expressions are extracted from the text based on the analysis results In addition, the corresponding synonym complement expression and the corresponding analysis result are stored in the analysis processing result table storage unit 7, and the inverse conversion rule stored in the inverse conversion rule storage unit 3 for each synonym complement expression Is used to normalize the notation, normalize readings using the syllable normalization rules stored in the syllable normalization rule storage unit 4, and analyze the results. It is stored in the Buru storage unit 7. Thereafter, the synonym complement expressions stored in the analysis processing result table storage unit 7 are combined together to create a synonym complement pair, and the synonym candidate pair list storage unit 8 stores the same.

同義性判定部2は、同義語候補ペアリスト記憶部8から同義語侯補ペアを取り出し、当該同義語侯補ペア中の同義語侯補表現同士が同義か否かを、解析処理結果テーブル記憶部7に記憶された前記同義語侯補ペア中の各同義語侯補表現に対応する解析結果や正規化処理結果、並びに音節類似度テーブル記憶部5に記憶された音節類似度テーブル及び省略判定モデル記憶部6に記憶された省略判定モデルを用いて以下に述べるようにして判定し、同義であれば前記同義語侯補ペアを同義語ペアとして出力し、これを同義語候補ペアリスト記憶部8に記憶された全ての同義語候補ペアに対して同様に繰り返して同義語ペアリストを出力する。   The synonym determination unit 2 takes out the synonym complement pair from the synonym candidate pair list storage unit 8 and stores whether or not the synonym complement expressions in the synonym complement pair are synonymous. Analysis result and normalization processing result corresponding to each synonym complement expression in the synonym complement pair stored in the unit 7, syllable similarity table and omission determination stored in the syllable similarity table storage unit 5 The abbreviation determination model stored in the model storage unit 6 is used for determination as described below, and if it is synonymous, the synonym complement pair is output as a synonym pair, which is output as a synonym candidate pair list storage unit The synonym pair list is output in a similar manner for all the synonym candidate pairs stored in 8.

同義か否かの判定は、まず、正規化後の表記が全く同じか否かで判定を行い、ここで同義と判定されなければ正規化後の読みが類似しているか否かで判定、即ち音節類似度テーブル記憶部5に記憶された音節類似度テーブルを用いて正規化後の読みの類似度を求め、該求めた類似度が所定の値以上かどうかで判定を行い、さらにここでも同義と判定されない場合は表記または読みが包含関係にあれば、省略判定モデル記憶部6に記憶された省略判定モデルを用いて省略語関係にある(略語)か否かで判定を行う。   The determination of whether or not synonyms are made by first determining whether or not the normalized notation is exactly the same, and if it is not determined to be synonymous here, it is determined whether or not the normalized readings are similar, that is, Using the syllable similarity table stored in the syllable similarity table storage unit 5, the similarity of reading after normalization is obtained, and it is determined whether or not the obtained similarity is equal to or greater than a predetermined value. If the notation or reading is in the inclusive relationship, the determination is made based on whether or not the abbreviation model is stored using the abbreviation determination model stored in the abbreviation determination model storage unit 6.

逆変換ルール記憶部3は、名詞を利用する際に一般的に挿入されると思われる接頭辞や接尾辞等の接辞形を削除するルール、愛称を作成する際に利用される繰り返し表現を削除するルールを少なくとも含む、同義語侯補表現の表記を正規化し、当該表記の正規化に併せて読みを正規化するための逆変換ルールを記憶している。   Inverse conversion rule storage unit 3 deletes rules that delete prefixes and suffixes that are generally inserted when using nouns, and repetitive expressions that are used to create nicknames The reverse conversion rule for normalizing the notation of the synonym complement expression including at least the rule to perform and normalizing the reading in accordance with the normalization of the notation is stored.

音節正規化ルール記憶部4は、同義語侯補表現の読みの母音連続や長音、促音に適用することで、和語と外来語とで異なる読みの長さの単位(モーラと音節)、口語表現、音訳時のゆれを少なくとも正規化するための音節正規化ルールを記憶している。   The syllable normalization rule storage unit 4 is applied to the vowel continuation, long sound, and prompting of the reading of the synonym complement expression, so that the unit of reading length (mora and syllable) that differs between Japanese and foreign words, colloquial It stores syllable normalization rules for at least normalizing fluctuations in expression and transliteration.

音節類似度テーブル記憶部5は、「表記は異なるが読みが類似する音節ペア」をキーとし、「距離(類似度)」を値とした、同義語侯補ペア中の同義語侯補表現同士の読みの類似度を求めるための音節類似度テーブルを記憶している。   The syllable similarity table storage unit 5 uses synonym complement expressions in a synonym complement pair with “a syllable pair with different notation but similar reading” as a key and “distance (similarity)” as a value. A syllable similarity table for obtaining the similarity of readings of

省略判定モデル記憶部6は、予め機械学習により生成した、2つの単語が省略語関係にあるか否かを判定するモデルからなる、同義語侯補ペア中の同義語侯補表現同士が省略語関係にあるか否かを判定するための省略判定モデルを記憶している。   The abbreviation determination model storage unit 6 includes a model for determining whether two words generated in advance by machine learning have an abbreviation relationship, and the synonym complement expressions in the synonym complement pair are abbreviations. An omission determination model for determining whether or not there is a relationship is stored.

以下、前述した各部についてさらに詳細に説明する。なお、以下の説明では各記憶部3乃至8の記憶内容に対しても、当該記憶部の符号をそのまま付して説明する場合があることを注記しておく。   Hereafter, each part mentioned above is demonstrated in detail. It should be noted that in the following description, the storage contents of the storage units 3 to 8 may be described with the reference numerals of the storage units as they are.

[同義語侯補生成部1]
図2は同義語侯補生成部1の詳細を示すもので、解析処理部11、正規化処理部12及びペア生成部13からなる。同義語侯補生成部1では、1テキストを入力として、解析処理部11で形態素解析及び固有表現抽出等のテキスト解析処理を行い、その結果をもとに同義語侯補表現の切り出しを行う。そして正規化処理部12で同義語侯補表現の正規化を行った後、ペア生成部13で同義語侯補ペアを生成する。本実施の形態においては、同義侯補表現として固有表現を対象とした場合を例に採って説明する。
[Synonym complement generation unit 1]
FIG. 2 shows the details of the synonym complement generation unit 1, which includes an analysis processing unit 11, a normalization processing unit 12, and a pair generation unit 13. In the synonym complement generation unit 1, one text is input, and the analysis processing unit 11 performs text analysis processing such as morphological analysis and proper expression extraction, and cuts out the synonym complement expression based on the results. Then, after normalization processing unit 12 normalizes the synonym complement expression, pair generation unit 13 generates a synonym complement pair. In the present embodiment, a case where a specific expression is targeted as a synonymous complement expression will be described as an example.

解析処理部11では、周知の技術として確立されている形態素解析技術、固有表現抽出技術などを用いてテキストの解析を行い、同義語侯補表現の抽出を行う。形態素解析では、テキストに対し、形態素(表記)、読み、品詞(固有表現クラスを含む)などの情報を付与する。この時、テキスト内の文字の半角/全角の統一など、単純な正規化も済ませておく。読み付与は1番尤もらしいものを用いても、N−bestを用いても良い。また、表記がアルファベットなどの未知語であり、形態素解析のみでは正しく読みが付与されないものについては読みを付与し直す。この読みの付与に関しては、アルファベットなど未知語の読みを正しく推定する手法(例えば、特開2001−142877公報(発明の名称:アルファベット文字・日本語読み対応付け装置と方法およびアルファベット単語音訳装置と方法ならびにその処理プログラムを記録した記録媒体)等参照)を利用する。   The analysis processing unit 11 analyzes the text by using a morphological analysis technique and a specific expression extraction technique established as well-known techniques, and extracts synonym complement expressions. In the morphological analysis, information such as morpheme (notation), reading, part of speech (including proper expression class), etc. is given to the text. At this time, simple normalization such as unification of half-width / full-width characters in the text is also completed. For the reading assignment, the most likely one may be used, or N-best may be used. In addition, if the notation is an unknown word such as an alphabet and the reading is not correctly given only by morphological analysis, the reading is given again. Regarding the provision of readings, a method of correctly estimating readings of unknown words such as alphabets (for example, Japanese Patent Application Laid-Open No. 2001-142877 (name of invention: alphabetic character / Japanese reading correspondence device and method and alphabetic word transliteration device and method) As well as a recording medium on which the processing program is recorded).

同義語侯補表現は、固有表現抽出技術により切り出し、表記、読み、品詞等の解析結果(の情報)と共に解析処理結果テーブル7の同義語侯補表現カラム、解析結果カラムへ書き出す。但し、表記が全く同一の同義語侯補表現が既にテーブル7内に存在する場合はレコードが重複しないよう、書き出しは行わない。またこの時、形態素の区切りの情報は、例えば“/”の記号などを用いて表記、読み、品詞それぞれで保持しておく。解析処理結果テーブル7の一例を図3に示す。但し、この時点では解析処理結果テーブル7のうち同義語侯補表現カラムと解析結果カラムのみが埋まり、他は空の状態である。   The synonym complement expression is cut out by the unique expression extraction technique, and written to the synonym complement expression column and the analysis result column of the analysis processing result table 7 together with the analysis result (information thereof) such as notation, reading, and part of speech. However, when a synonym complement expression having exactly the same notation already exists in the table 7, writing is not performed so that records do not overlap. At this time, the morpheme segmentation information is stored in notation, reading, and part of speech using, for example, a symbol “/”. An example of the analysis processing result table 7 is shown in FIG. However, at this time, only the synonym complement expression column and the analysis result column in the analysis processing result table 7 are filled, and the others are empty.

ここでは説明のため切り出し対象を固有表現としたが、形態素解析結果を利用して名詞や複合名詞を同義語侯補表現としても良い。この結果、作成される解析処理結果テーブル7には1テキスト内に存在する同義語侯補表現の異なり数分だけレコードができる。   Here, for the sake of explanation, the cut-out target is a specific expression, but nouns and compound nouns may be used as synonym complement expressions using morphological analysis results. As a result, the created analysis processing result table 7 can have records corresponding to the number of different synonym complement expressions existing in one text.

正規化処理部12での処理の流れを図4を用いて説明する。正規化処理部12では、切り出した同義語侯補表現に対して表記と読みから正規化を行う。入力は解析処理部11で作成した解析処理結果テーブル7の全レコードの解析結果カラムのリストとし、リスト内のレコードごとに以下の処理を繰り返す。全レコードの処理を終えた場合は、正規化処理部12での処理を終了する。   The flow of processing in the normalization processing unit 12 will be described with reference to FIG. The normalization processing unit 12 normalizes the extracted synonym complement expression from notation and reading. The input is a list of analysis result columns of all records in the analysis processing result table 7 created by the analysis processing unit 11, and the following processing is repeated for each record in the list. When all the records have been processed, the processing in the normalization processing unit 12 ends.

(ステップs12−1)解析処理結果テーブル7の解析結果カラムの表記がアルファベットであれば大文字/小文字を大文字に統一し、同カラムへ上書きする。ステップs12−2へ進む。   (Step s12-1) If the notation of the analysis result column in the analysis processing result table 7 is alphabet, uppercase / lowercase letters are unified to uppercase, and the same column is overwritten. Proceed to step s12-2.

(ステップs12−2)解析処理結果テーブル7の解析結果カラムの表記と読みに逆変換ルール記憶部3に記憶された逆変換ルールの表記用ルールと読み用ルール(詳細は後述)をそれぞれ適用し、結果を解析処理結果テーブル7へ書き出す。書き出し先については、逆変換ルールのうち、表記用ルールの適用結果は表記正規化カラム、読み用ルールの適用結果は表記+読み正規化カラムとする(なお、適用すべきルールがない場合は解析結果カラムの表記をそのまま表記正規化カラムへ書き出し、解析結果カラムの読みをそのまま表記+読み正規化カラムへ書き出す。)。ステップs12−3へ進む。   (Step s12-2) The notation rule and the reading rule (details will be described later) of the reverse conversion rule stored in the reverse conversion rule storage unit 3 are applied to the notation and the reading of the analysis result column in the analysis processing result table 7, respectively. The result is written to the analysis processing result table 7. For the export destination, out of the inverse conversion rules, the application result of the notation rule is the notation normalization column, and the application result of the reading rule is the notation + reading normalization column. The result column notation is written as it is to the notation normalization column, and the analysis result column reading is written as it is to the notation + reading normalization column.) Proceed to step s12-3.

(ステップs12−3)解析処理結果テーブル7の解析結果カラムの読みと、ステップs12−3で書き出した表記+読み正規化カラムの読みに対して音節正規化ルール記憶部4に記憶された音節正規化ルール(詳細は後述)を適用し、結果を解析処理結果テーブル7へ書き出す。解析結果カラムの読みを正規化した結果は読み正規化カラムへ書き出し、表記+読み正規化後の読みを正規化した結果は表記+読み正規化カラムへ上書きする(なお、適用すべきルールがない場合は解析結果カラムの読みをそのまま読み正規化カラムへ書き出し、表記+読み正規化カラムはそのまま(上書きしない)とする)。   (Step s12-3) The syllable normalization stored in the syllable normalization rule storage unit 4 for the reading of the analysis result column of the analysis processing result table 7 and the reading of the notation + reading normalization column written in step s12-3 Apply the conversion rule (details will be described later), and write the result to the analysis processing result table 7. The result of normalizing the reading of the analysis result column is written to the reading normalization column, and the result of normalizing the reading after reading + normalization is overwritten to the notation + reading normalization column (There are no rules to apply) In this case, the reading of the analysis result column is read as it is and written to the normalization column, and the notation + reading normalization column is left as it is (not overwritten)).

同義語侯補ペア作成部13では、正規化処理部12での処理を終えた解析処理結果テーブル7の全レコードを総当たりで組み合わせて同義語侯補ペアを作成し、そのペアの同義語侯補表現のID(候補ID)を同義語侯補ペアリストとして同義語候補ペアリスト記憶部8に記憶する。同義語侯補ペアリストの一例を図5に示す。ペア作成の手法に関しては、カバー範囲を重要視しないのであれば総当たり以外の、例えばメタ情報を用いた手法を用いても、以後の本発明を利用することは可能である。   The synonym complement pair creation unit 13 creates a synonym complement pair by combining all the records in the analysis processing result table 7 that has been processed by the normalization processing unit 12 in a brute force manner. The complementary expression ID (candidate ID) is stored in the synonym candidate pair list storage unit 8 as a synonym complement pair list. An example of the synonym complement pair list is shown in FIG. With respect to the pair creation method, if the cover range is not regarded as important, it is possible to use the present invention thereafter even using a method other than brute force, for example, using meta information.

[同義性判定部2]
図6は同義性判定部2の詳細を示すもので、表記類似判定部21、読み類似判定部22及び省略判定部23からなる。図7は同義性判定部2での処理の流れを示すものである。
[Synonymity determination unit 2]
FIG. 6 shows details of the synonymity determination unit 2, which includes a notation similarity determination unit 21, a reading similarity determination unit 22, and an omission determination unit 23. FIG. 7 shows the flow of processing in the synonymity determination unit 2.

同義性判定部2は、解析処理結果テーブル7及び同義語侯補ペアリスト8を入力とし、表記類似判定部21、読み類似判定部22及び省略判定部23により、同義語侯補ペア中の同義語侯補表現同士が同義か否かを判定し、同義語ペアリストを出力する。   The synonym determination unit 2 receives the analysis processing result table 7 and the synonym complement pair list 8 as input, and the synonym complement pair by the notation similarity determination unit 21, reading similarity determination unit 22, and omission determination unit 23. It is determined whether the word complement expressions are synonymous, and a synonym pair list is output.

即ち、表記類似判定部21では正規化した表記から同義性の判定を行い(ステップs21)、読み類似判定部22では正規化した読みから音節類似度テーブル5を用いて同義性の判定を行い(ステップs22)、そして省略判定部23では正規化した表記と読みの両方から省略判定モデル6を用いて省略語か否かを判定して同義性の判定を行う(ステップs23)。同義語侯補ペアリスト8のレコードごとにステップs21〜s23の処理を繰り返し、いずれかの過程で同義と判定された時点でその同義語侯補ペア中の同義語侯補表現同士を同義であると認定し(ステップs24)、次の同義語侯補ペアの処理へと移行する。最後まで同義と判定されなかったペアは同義語であると認定しない(ステップs25)。処理に必要となる同義語侯補ペアの同義語侯補表現のそれぞれの表記や読み、品詞等の情報は、同義語侯補ペアリスト8の候補IDを用いて解析処理結果テーブル7内の該当情報を参照する。同義語侯補ペアリスト8の全てのレコードの処理が終了した時点で、同義語と認定された同義語侯補ペアを同義語ペアリストとして出力する。   That is, the notation similarity determination unit 21 determines synonymity from the normalized notation (step s21), and the reading similarity determination unit 22 determines synonymity from the normalized reading using the syllable similarity table 5 ( In step s22), the omission determination unit 23 determines whether or not the abbreviation word uses the omission determination model 6 from both normalized notation and reading, and determines synonymity (step s23). The process of steps s21 to s23 is repeated for each record of the synonym complement pair list 8, and the synonym complement expressions in the synonym complement pair are synonymous when determined to be synonymous in any process. (Step s24), and the process proceeds to the next synonym complement pair process. Pairs that have not been determined to be synonymous until the end are not recognized as synonyms (step s25). Information about the notation, reading, part of speech, etc. of the synonym complement expression of the synonym complement pair necessary for the processing is the corresponding in the analysis processing result table 7 using the candidate ID of the synonym complement pair list 8 Browse information. When the processing of all the records in the synonym complement pair list 8 is finished, the synonym complement pair recognized as a synonym is output as a synonym pair list.

表記類似判定部21での処理の流れを図8を用いて説明する。入力は同義語侯補ペアリスト8の1レコードとする。ここでは同義語侯補ペア中の同義語侯補表現の各々の正規化後の表記を見て判定を行う。   The flow of processing in the notation similarity determination unit 21 will be described with reference to FIG. The input is one record of the synonym complement pair list 8. Here, the determination is made by looking at the normalized notation of each synonym complement expression in the synonym complement pair.

(ステップs21−1)同義語侯補ペア中の各同義語侯補表現の表記正規化カラムの表記同士が全く同じである場合はステップs21−2へ、それ以外の場合は表記類似判定部21での処理を終了し、読み類似判定部22での処理に進む。   (Step s21-1) When the notation normalization column notation of each synonym complement expression in the synonym complement pair is exactly the same, go to step s21-2, otherwise the notation similarity determination unit 21 Is finished, and the process proceeds to the reading similarity determination unit 22.

(ステップs21−2)同義と判定し、同義性判定部2での処理を終了する。   (Step s21-2) It is determined to be synonymous, and the process in the synonymity determining unit 2 is terminated.

読み類似判定部22での処理の流れを図9を用いて説明する。表記類似判定部21で同義と判定されなかった同義語侯補ペアのレコードを入力とする。ここでは同義語侯補ペア中の同義語侯補表現の各々の読みを見て判定を行う。1つの同義語侯補表現について読みは、解析結果カラムの読み、読み正規化カラムの読み、表記+読み正規化カラムの読みの3つが存在するため、それぞれについて以下の処理を3回繰り返し行い、そのいずれかの過程で同義と判定されれば同義語であると認定して同義性判定部2での処理を終了し、そうでない場合は省略判定部23での処理に進める。この際、解析結果カラムの読み、読み正規化カラムの読み、表記+読み正規化カラムの読みを単に“読み”と記述する。また、以下の処理において、マッチングの際は形態素の区切り情報(“/”)は無視する。   The flow of processing in the reading similarity determination unit 22 will be described with reference to FIG. The record of the synonym complement pair which was not determined to be synonymous by the notation similarity determination unit 21 is input. Here, the determination is made by looking at each reading of the synonym complement expression in the synonym complement pair. There are three readings for one synonym complement expression: reading of the analysis result column, reading of the reading normalization column, and reading of the notation + reading normalization column. Therefore, the following processing is repeated three times for each, If the synonym is determined to be synonymous in any one of the processes, the synonym is recognized and the process in the synonym determination unit 2 is terminated. If not, the process proceeds to the process in the omission determination unit 23. At this time, the reading of the analysis result column, the reading of the reading normalization column, and the reading of the notation + reading normalization column are simply described as “reading”. In the following processing, the morpheme delimiter information (“/”) is ignored during matching.

(ステップs22−1)同義語侯補ペア中の各同義語侯補表現の読みが全く同じである場合はステップs22−5へ、それ以外はステップs22−2へ進む。   (Step s22-1) If the readings of the synonym complement expressions in the synonym complement pair are exactly the same, go to Step s22-5, otherwise go to Step s22-2.

(ステップs22−2)同義語侯補ペア中の各同義語侯補表現の読みの音節数をカウントし、同じである場合はステップs22−3へ進む。それ以外の場合は処理を終了する(読み類似判定部22での処理の繰り返し回数が2回以下ならステップs22−1へ戻り、当該同義語侯補ペアの次の読みに対する処理へ移る。3回目であれば読み類似判定部22での処理を終了する。)。   (Step s22-2) The number of syllables of reading of each synonym complement expression in the synonym complement pair is counted, and if they are the same, the process proceeds to step s22-3. Otherwise, the process ends (if the number of repetitions of the process in the reading similarity determination unit 22 is 2 or less, the process returns to step s22-1 to move to the process for the next reading of the synonym complement pair. If so, the process in the reading similarity determination unit 22 ends.)

(ステップs22−3)同義語侯補ペア中の各同義語侯補表現の読みの、音節位置が同じで読みが異なる音節間の距離を音節類似度テーブル5(詳細は後述する。)を用いて求める。音節位置が同じで読みが異なる音節が多数存在する場合は、ペア間で異なる音節間の距離の総和を用いる。ステップs22−4へ進む。   (Step s22-3) The distance between syllables having the same syllable position but different readings in the reading of each synonym complement expression in the synonym complement pair is used using the syllable similarity table 5 (details will be described later). Ask. When there are many syllables having the same syllable position but different readings, the sum of distances between syllables different between pairs is used. Proceed to step s22-4.

(ステップs22−4)距離の総和が予め設定した閾値より小さければステップs22−5へ進む。それ以外の場合は処理を終了する(読み類似判定部22での処理の繰り返し回数が2回以下ならステップs22−1へ戻り、当該同義語侯補ペアの次の読みの処理へ移る。3回目であれば読み類似判定部22での処理を終了する。)。   (Step s22-4) If the sum of the distances is smaller than a preset threshold value, the process proceeds to Step s22-5. In other cases, the process ends (if the number of repetitions of the process in the reading similarity determination unit 22 is 2 or less, the process returns to step s22-1 and proceeds to the process of the next reading of the synonym complement pair. If so, the process in the reading similarity determination unit 22 ends.)

(ステップs22−5)同義と判定し、同義性判定部2での処理を終了する。   (Step s22-5) The synonymity determination unit 2 ends the process of determining synonymity.

省略判定部23での処理の流れを図10を用いて説明する。読み類似判定部22で同義と判定されなかった同義語侯補ペアのレコードを入力とする。ここでは同義語侯補ペアについて、解析結果カラムの表記同士、表記正規化カラムの表記同士、読み正規化カラムの読み同士、表記+読み正規化カラムの読み同士の4パターンそれぞれについて以下の処理を繰り返し行い、そのいずれかの過程で同義と判定されれば同義語であると認定し、そうでない場合は同義語でないと認定して同義性判定部2での処理を終了する。この際、解析結果カラムの表記、表記正規化カラムの表記を単に“表記”と記述し、読み正規化カラムの読み、表記+読み正規化カラムの読みを単に“読み”と記述する。また、以下の処理において、マッチングの際は形態素の区切り情報(“/”)は無視する。   The flow of processing in the omission determination unit 23 will be described with reference to FIG. The record of the synonym complement pair which was not determined to be synonymous by the reading similarity determination unit 22 is input. Here, for synonym complement pairs, the following processing is performed for each of the four patterns of notation of analysis result column, notation of notation normalization column, notation of reading normalization column, notation + reading of normalization column. Repeatedly, if it is determined to be synonymous in any one of the processes, it is recognized as a synonym, otherwise it is determined as not a synonym and the process in the synonym determination unit 2 is terminated. At this time, the notation of the analysis result column and the notation normalization column are simply described as “notation”, and the reading normalization column reading and the notation + reading normalization column reading are simply described as “reading”. In the following processing, the morpheme delimiter information (“/”) is ignored during matching.

(ステップs23−1)表記を対象としている場合は表記同士、読みを対象としている場合は読み同士が包含関係にある場合はステップs23−2へ進む。それ以外の場合は処理を終了する(省略判定部23での処理の繰り返し回数が3回以下なら当該同義語侯補ペアの次の表記または読みに対する処理へ移る。4回目であれば省略判定部23での処理を終了する。)。   (Step s23-1) When notation is targeted, when notation is read, when reading is in an inclusive relationship, it progresses to step s23-2. In other cases, the process ends (if the number of repetitions of the process in the abbreviation determination unit 23 is 3 or less, the process proceeds to the next notation or reading of the synonym complement pair. The processing at 23 is terminated.)

(ステップs23−2)DPマッチング法(Richard E.Bellman,“Dynamic Programming”,1957)等を用いて位置合わせを行う。ステップs23−3へ進む。   (Step s23-2) Alignment is performed using a DP matching method (Richard E. Bellman, “Dynamic Programming”, 1957) or the like. Proceed to step s23-3.

(ステップs23−3)それぞれのペアのうち長い文字数(読みの場合は音節数)の方を省略前、短い方を省略後として、省略前後の差異を元に、分類器にかけるための素性(詳細は後述)を抽出する。ステップs23−4へ進む。   (Step s23-3) The feature for applying to the classifier based on the difference between before and after the omission, with the longer number of characters (number of syllables in the case of reading) of each pair before omission and after the shorter one is omitted. The details are described later. Proceed to step s23-4.

(ステップs23−4)ステップs23−3で抽出した素性と分類器のモデルである省略判定モデル6を用いて同義義語侯補ペア中の同義語侯補表現同士が省略語関係にあるかを判定し、省略語であると判定した場合はステップs25−5へ進む。それ以外の場合、処理を終了する(省略判定部23での処理の繰り返し回数が3回以下なら当該同義語侯補ペアの次の表記または読みの処理へ移る。4回目であれば省略判定部23での処理を終了する。)。   (Step s23-4) Whether the synonym complement expressions in the synonym complement pairs are abbreviations using the abbreviation determination model 6 which is the model of the feature and classifier extracted in Step s23-3. If it is determined that it is an abbreviation, the process proceeds to step s25-5. In other cases, the process ends (if the number of repetitions of the process in the omission determination unit 23 is 3 or less, the process proceeds to the next notation or reading process of the synonym complement pair. The processing at 23 is terminated.)

(ステップs23−5)同義と判定し、同義性判定部2での処理を終了する。   (Step s <b> 23-5) The synonym determination unit 2 terminates the process of determining synonymity.

前述した(ステップs23−3)で用いる素性としては、同義語侯補ペア中の各同義語侯補表現の表記(解析結果カラムの表記または表記正規化カラムの表記)に対して抽出を行う場合、
<省略前後の同義語侯補表現>
・省略前:形態素数,文字数,品詞,固有表現クラス,文字種
・省略後:形態素数,文字数,品詞,固有表現クラス,文字種
<形態素単位の素性>
・形態素が丸ごと省略された場合:品詞,表記,文字数,文字種,位置情報(先頭か末尾か真中か),先頭の形態素が残っているか
・形態素が丸ごと残った場合:品詞,表記,文字数,文字種,位置情報(先頭か末尾か真中か),末尾の形態素を省略したか
<文字単位の素性>
・文字単位で省略された場合:品詞,表記,文字種,位置情報(先頭か末尾か真中か),表記内で先頭の文字を省略したか
・文字単位で残った場合:品詞,表記,文字種,位置情報(先頭か末尾か真中か),形態素内で先頭の文字が残っているか
を用いる。しかし、ここに挙げた以外にも、形態素解析情報、文脈情報などを利用しても良い。この時、用いる品詞や表記などの情報は解析処理結果テーブル7の情報を利用する。
As the feature used in the above-described (step s23-3), when extraction is performed for each synonym complement expression (analysis result column notation or notation normalization column notation) in the synonym complement pair ,
<Synonym complement expression before and after omission>
-Before omission: morpheme number, number of characters, part of speech, proper expression class, character type-After omission: morpheme number, number of characters, part of speech, proper expression class, character type <Feature of morpheme unit>
・ If the whole morpheme is omitted: part of speech, notation, number of characters, character type, position information (whether it is the beginning or end or middle), whether the first morpheme remains ・ If the whole morpheme remains: part of speech, notation, number of characters, character type , Position information (whether it is the beginning, the end, or the middle), whether the morpheme at the end was omitted <Characteristic features in character units>
・ If omitted in character units: part of speech, notation, character type, position information (whether it is first or last or middle), or whether the first character in the notation is omitted ・ If left in character units: part of speech, notation, character type, Uses position information (whether it is the beginning, end, or middle) and whether the first character remains in the morpheme. However, in addition to those listed here, morphological analysis information, context information, and the like may be used. At this time, information such as part of speech and notation used uses information in the analysis processing result table 7.

また、同義語侯補ペア中の各同義語侯補表現の読み(読み正規化カラムの読みまたは表記+読み正規化カラムの読み)に対して素性を抽出する際は、上記で述べた素性例において、表記の素性には読みを、文字数の素性には音節数を、位置情報の素性には音節で数えた場合の何音節目かを用いる。また、文字種の素性は「カタカナ」で統一する。   In addition, when extracting features for the reading of each synonym complement expression in the synonym complement pair (reading normalization column reading or notation + reading normalization column reading), the feature examples described above In the above, the reading feature is used for reading, the syllable number is used for the character feature, and the number of syllables when the position information feature is counted by syllable is used. Also, the character type features are unified with “Katakana”.

[逆変換ルール(記憶部)3]
図11は逆変換ルール3の一例を示すもので、名詞を利用する際に一般的に挿入されると思われる接頭辞や接尾辞等の接辞形を削除するルール、愛称を作成する際に利用される繰り返し表現を削除するルール等をヒューリスティックに記述したものである。一組の同義語候補ペアに対して適用可能なルールは全て適用する。本ルールは、同義語侯補表現における、より一般的な表記に対して挿入されると思われる定型的な文字列を当該表記から削除し、当該文字列の削除に併せて読みを訂正するルールであるため、省略語を作成するためのルールとは異なる(本発明においては、省略語は逆変換ルールを用いて判定せず、省略判定部において分類器を用いて判定する。)。よってルールは正規表現などを用いて簡単に書き表すことができ、多くの同義語に共通して挿入されるような文字列を削除するルールとする。なお、図11での正規表現はPerlで書くことを例に説明を行っている。よって、他の表現を用いる場合には、その手法に準じる。
[Inverse conversion rule (storage unit) 3]
FIG. 11 shows an example of the reverse conversion rule 3, which is used when creating a nickname or a rule for deleting a prefix form such as a prefix or suffix that is generally inserted when using a noun. Heuristic description of rules for deleting repeated expressions. All rules applicable to a set of synonym candidate pairs apply. This rule is a rule that deletes a typical character string that is supposed to be inserted for a more general notation in a synonym complement expression, and corrects the reading along with the deletion of the character string. Therefore, it is different from the rule for creating an abbreviation (in the present invention, the abbreviation is not determined using the inverse transformation rule, but is determined using the classifier in the abbreviation determination unit). Therefore, the rule can be easily written using a regular expression or the like, and the rule is to delete a character string inserted in common with many synonyms. Note that the regular expression in FIG. 11 is described as being written in Perl. Therefore, when using other expressions, the method is followed.

逆変換ルール3は表記に適用するルール、読みに適用するルールが対になっており、表記用正規表現が適用できない場合は、対になった読み用正規表現も適用しない。しかし、その逆に読み用正規表現が適用できない、または存在しない場合に関しては表記用正規表現のみを適用して良い(表記が変化しないのに読みが変化することはあり得ないが、表記が変化しても読みが変化しない場合はあり得るため)。逆変換ルールは図11で挙げたもの以外にも任意に作成して登録可能で、例えば接尾辞の削除ルールで「ちゃん」や「氏」などを加えることなどが考えられる。   In the reverse conversion rule 3, a rule that applies to notation and a rule that applies to reading are paired. If a regular expression for notation is not applicable, the paired reading regular expression is not applied. However, on the contrary, if the regular expression for reading cannot be applied or does not exist, only the regular expression for notation may be applied (the reading may not change although the notation does not change, but the notation changes) Even if the reading doesn't change). The reverse conversion rules can be arbitrarily created and registered in addition to those shown in FIG. 11, and for example, “chan” or “Mr.” can be added as a suffix deletion rule.

[音節正規化ルール(記憶部)4]
図12は音節正規化ルール4の一例を示すもので、和語と外来語とで異なる読みの長さの単位(モーラと音節)、口語表現、音訳時のゆれ等を正規化するために、同義語侯補表現の読みの母音連続や長音、促音に適用するルール等からなる。ルールの適応順序はルール番号順とし、適用可能なルールは全て適用する。
[Syllable normalization rule (storage unit) 4]
FIG. 12 shows an example of the syllable normalization rule 4, in order to normalize the reading length unit (mora and syllable), colloquial expression, fluctuation in transliteration, etc. Consists of rules that apply to continuous vowels, long tones, and prompting sounds of synonym complement expressions. The rules are applied in the order of rule numbers, and all applicable rules are applied.

表記変換による同義性判定には、位置が同じで読みが異なる音節間の距離を用いる。その際、表記変換による同義語間では音節の長さが等しいことが条件となる。しかし、和語はモーラ、外来語では音節と、読みの単位が異なる。さらに他言語に和語の読みを与える際(音訳時)には、外来語間でも同じ音節数にならないという問題があった。その例として以下のような場合が挙げられる。   For synonymity determination by notation conversion, the distance between syllables having the same position but different readings is used. At that time, the syllable length is the same between synonyms by notation conversion. However, the unit of reading differs between Japanese words and mora, and foreign words and syllables. Furthermore, when giving Japanese readings to other languages (transliteration), there was a problem that the number of syllables was not the same between foreign words. Examples thereof include the following cases.

・和語、口語「ユウコ」,「ユーコ」
モーラ数で数えると3モーラであるが、音節数で数えると3音節,2音節となる。
・ Japanese, colloquial "Yuko", "Yuko"
When counted by the number of mora, it is 3 mora, but when counted by the number of syllables, it becomes 3 syllables and 2 syllables.

・外来語「スパゲティ」,「スパゲティー」,「スパゲッティー」
音節数で数えれば3つとも4音節であるが、モーラ数で数えると4モーラ,5モーラ,6モーラとなる。
・ Foreign words "spaghetti", "spaghetti", "spaghetti"
If counted by the number of syllables, all three are 4 syllables, but if counted by the number of mora, they become 4 mora, 5 mora, and 6 mora.

・他言語の和語読み「ウインブルドン」,「ウィンブルドン」
音節数で数えると7音節,6音節、モーラ数で数えても7モーラ,6モーラとなる。
・ Japanese language reading "Wimbledon", "Wimbledon" in other languages
When counted by the number of syllables, 7 syllables and 6 syllables, and even when counted by the number of mora, they become 7 mora and 6 mora.

これらに対し、図12に示すような音節正規化ルールを用いることにより、全て音節数で数えられるように読みの長さの単位を統一でき、(ステップs22−3)において位置合わせが可能となる。   On the other hand, by using the syllable normalization rule as shown in FIG. 12, the unit of the reading length can be unified so that all can be counted by the number of syllables, and the alignment can be performed in (step s22-3). .

[音節類似度テーブル(記憶部)5]
図13(a)は音節類似度テーブルの作成手順、同図(b)は音節類似度テーブルの一例を示すもので、音節類似度テーブル5は、キー:表記は異なるが読みが類似する音節ペア、値:距離(類似度)、により構成される。このテーブル5は図13(a)に示すように、形態素解析辞書から標準表記が同じで発音(読み)が異なる単語を収集し、読み正規化処理部12と同様に音節正規化ルール4を適用して読みの長さの単位を音節に統一し、音節数が等しい場合に位置合わせを行う。そして音節位置が同じで読みが異なる音節ペアを抜き出してカウントし、音節ペアの出現数を、音節ペアを構成する音節それぞれの出現回数の和で割った値を音節間の距離(類似度)とすることで作成する。
[Syllable similarity table (storage unit) 5]
FIG. 13 (a) shows a procedure for creating a syllable similarity table, and FIG. 13 (b) shows an example of a syllable similarity table. The syllable similarity table 5 is a syllable pair whose key: notation is different but reading is similar. , Value: distance (similarity). As shown in FIG. 13A, this table 5 collects words having the same standard notation and different pronunciation (reading) from the morphological analysis dictionary, and applies the syllable normalization rule 4 in the same manner as the reading normalization processing unit 12. Then, the unit of the reading length is unified into syllables, and alignment is performed when the number of syllables is equal. Then, syllable pairs with the same syllable position but different readings are extracted and counted, and the value obtained by dividing the number of occurrences of syllable pairs by the sum of the number of occurrences of each syllable constituting the syllable pair is the distance between syllables (similarity). To create.

[省略判定モデル(記憶部)6]
省略判定モデル6は2つの単語が省略語関係にあるか否かを判定するためのモデルで、判定を行いたい同義語侯補ペア中の各同義語侯補表現の表記及び読み、形態素解析情報、位置合わせ情報等を入力とし、同義か否かを2値判定する識別関数からなる。識別関数としては、例えばV.Vapnik,“The nature of statistical learning theory”,Springer,1995で述べられているSupport Vector Machine(SVM)の識別関数を用い、識別関数のパラメータは予め省略判定部23で述べた素性からなる学習データをSVMで学習して決定しておく。ここでは学習アルゴリズムとしてSVMを挙げたが、決定木、最大エントロピー法等のほかの学習アルゴリズムを利用しても良い。
[Omission determination model (storage unit) 6]
The abbreviation determination model 6 is a model for determining whether or not two words are in an abbreviation relationship. The notation and reading of each synonym complement expression in the synonym complement pair to be determined, morphological analysis information It consists of an identification function that uses the alignment information and the like as input, and binary-determines whether or not they are synonymous. As an identification function, for example, V.I. The discriminant function of Support Vector Machine (SVM) described in Vapnik, “The nature of statistical learning theory”, Springer, 1995, is used. Learning and determining with SVM. Here, although SVM was mentioned as a learning algorithm, you may utilize other learning algorithms, such as a decision tree and the maximum entropy method.

前述した実施の形態における具体的な処理の実施例を詳細に説明する。ここで、同義性判定部2の読み類似判定部22で用いる閾値には「0.9」を用い、また、省略判定モデル4の分類器としてはSVMを用いることとする。まず、同義語侯補生成部1及び同義性判定部2で行う処理を説明し、その後、逆変換ルール3及び音節正規化ルール4の詳細な適用例、音節類似度テーブル5及び省略判定モデル6の作成例について説明する。   An example of specific processing in the above-described embodiment will be described in detail. Here, “0.9” is used as the threshold value used in the reading similarity determination unit 22 of the synonymity determination unit 2, and SVM is used as the classifier of the omission determination model 4. First, processing performed by the synonym complement generation unit 1 and the synonymity determination unit 2 will be described, and then detailed application examples of the inverse transformation rule 3 and the syllable normalization rule 4, the syllable similarity table 5 and the omission determination model 6 An example of creating will be described.

[I]同義語侯補生成部1及び同義性判定部2で行う処理
[同義語侯補生成部1]
解析処理部11への入力テキストが図14に示すようなものであった場合、「アップル社」,「eva」などの同義語侯補表現が抽出される。入力テキスト中では「アップル社」という表現が2度出現しているが、解析処理結果テーブル7には1度だけ書き出す。1テキスト全ての解析を終えた状態が図15に示すようになったものとして、以下説明を行う。
[I] Processes performed by the synonym complement generation unit 1 and the synonym determination unit 2 [synonym complement generation unit 1]
If the input text to the analysis processing unit 11 is as shown in FIG. 14, synonym complement expressions such as “Apple” and “eva” are extracted. Although the expression “Apple” appears twice in the input text, it is written once in the analysis processing result table 7. The following description will be made on the assumption that the analysis of one text has been completed as shown in FIG.

正規化処理部12への入力を図15に示す解析処理結果テーブル7の全レコードの同義語侯補表現カラムのリストとし、リスト内のレコードごとに以下の処理を繰り返す。   The input to the normalization processing unit 12 is a list of synonym complement expression columns of all records in the analysis processing result table 7 shown in FIG. 15, and the following processing is repeated for each record in the list.

まずID1の「アップル社」から処理を開始する。   First, processing is started from “Apple Inc.” of ID1.

(ステップs12−1)解析結果カラムの表記「アップル/社」はアルファベットではないため、そのままステップs12−2へ進む(ここで処理を行う例;レコードID2:evaは大文字EVAへと変換し、上書きする。)。   (Step s12-1) Since the notation “Apple / Company” in the analysis result column is not an alphabet, the process proceeds directly to Step s12-2 (example of processing here; record ID 2: eva is converted to uppercase EVA and overwritten) To do.)

(ステップs12−2)解析結果カラムの表記「アップル/社」、解析結果カラムの読み「アップル/シャ」に逆変換ルール3(詳細は後述)の表記用正規表現、読み用正規表現をそれぞれ適用すると、それぞれ「アップル」、「アップル」となる。その結果を解析処理結果テーブル7の表記正規化カラム、表記+読み正規化カラムへ書き出し、ステップs12−3へ進む。   (Step s12-2) The notation regular expression and the reading regular expression of the reverse conversion rule 3 (details will be described later) are applied to the analysis result column notation “Apple / Company” and the analysis result column reading “Apple / Sha”, respectively. Then, “Apple” and “Apple” respectively. The result is written to the notation normalization column and the notation + reading normalization column of the analysis processing result table 7, and the process proceeds to step s12-3.

(ステップs12−3)解析結果カラムの読み「アップル/シャ」と、ステップs12−2で書き出した表記+読み正規化カラムの読み「アップル」に対して音節正規化ルール4(詳細は後述)を適用する。前者は「アプル/シャ」となり、読み正規化カラムへ書き出される。後者は「アプル」となり、表記+読み正規化カラムへ上書きされる。   (Step s12-3) The syllable normalization rule 4 (details will be described later) is applied to the analysis result column reading “Apple / Sha” and the notation + reading normalization column reading “Apple” written in Step s12-2. Apply. The former is “apple / sha” and is written to the reading normalization column. The latter becomes “apple” and is overwritten to the notation + reading normalization column.

以上の処理をID2以後も同様に繰り返す(図15ではID14まで表示)。その結果が図3となる。   The above processing is repeated in the same manner after ID2 (in FIG. 15, ID14 is displayed). The result is shown in FIG.

ここで、(ステップs12−2)で逆変換ルール3が適用されるのは、図3に示す処理結果テーブル7内のレコードのうち(以下の例からは形態素区切り記号“/”は必要のない限り省略する。)、
・レコードID1:アップル社(→表記:アップル,表記+読み:アップル)
・レコードID10:ショコタン(→表記:ショコ,表記+読み:ショコ)
・レコードID12:ヨン様(→表記:ヨン,表記+読み:ヨン)
・レコードID14:ミキティ(→表記:ミキ,表記+読み:ミキ)
の4つである。
Here, in (step s12-2), the reverse conversion rule 3 is applied because of the records in the processing result table 7 shown in FIG. 3 (the morpheme delimiter “/” is not necessary from the following example) Omitted as far as possible).
・ Record ID 1: Apple (→ notation: Apple, notation + reading: Apple)
-Record ID 10: Shokotan (→ Notation: Choco, Notation + Reading: Choco)
・ Record ID 12: Yong (→ Notation: Yon, Notation + Reading: Yon)
Record ID 14: Mikiti (→ notation: Miki, notation + reading: Miki)
There are four.

また、(ステップs12−3)で音節正規化ルール4が適用されるのは、図3に示す処理結果テーブル7内のレコードのうち、
・レコードID1:アップル社(→読み:アプルシャ,表記+読み:アプル)
・レコードID2:ショウコ(→読み:ショコ,表記+読み:ショコ)
・レコードID4:八景島シーパラダイス(→読み:ハケイジマシパラダイス,表記+読み:ハケイジマシパラダイス)
・レコードID8:アップル(→読み:アプル,表記+読み:アプル)
・レコードID11:シーパラ(→読み:シパラ,表記+読み:シパラ)
の5つである。
In addition, the syllable normalization rule 4 is applied in (step s12-3) among the records in the processing result table 7 shown in FIG.
・ Record ID1: Apple Inc. (→ Reading: Apulcia, Notation + Reading: Apple)
・ Record ID2: Shoko (→ reading: sho, notation + reading: shoko)
・ Record ID4: Hakkeijima Sea Paradise (→ Reading: Hakeijimashi Paradise, Notation + Reading: Hakeijimashi Paradise)
・ Record ID8: Apple (→ reading: apple, notation + reading: apple)
-Record ID 11: Sea Para (→ Reading: Shipara, Notation + Reading: Shipara)
These are five.

同義語侯補ペア作成部13では正規化処理部12で作成した図3に示す解析処理結果テーブル7の全レコード総当たりで同義語侯補ペアを作成し、図5に示す同義語侯補ペアリスト8を出力する。   The synonym complement pair creation unit 13 creates a synonym complement pair for all the records in the analysis processing result table 7 created in the normalization processing unit 12 shown in FIG. 3, and the synonym complement pair shown in FIG. Listing 8 is output.

[同義性判定部2]
同義性判定部2への入力は、図5に示す同義語侯補ペアリスト8及び図3に示す解析処理結果テーブル7である。以下、図5のリスト8のレコードごとに表記類似判定部21から省略判定部23までの処理を繰り返し、どこかの過程で同義と判定された時点でその同義語侯補ペアを同義であると認定し、次の同義語侯補ペアの処理へと移行する。最後まで同義と判定されなかったペアは同義語であると認定しない。全てのレコードの処理が終了した時点で、同義語と認定された同義語侯補ペアを同義語ペアリストとして出力する。
[Synonymity determination unit 2]
The input to the synonym determination unit 2 is the synonym complement pair list 8 shown in FIG. 5 and the analysis processing result table 7 shown in FIG. Hereinafter, the processing from the notation similarity determination unit 21 to the omission determination unit 23 is repeated for each record in the list 8 of FIG. 5, and the synonym complement pair is synonymous when it is determined to be synonymous in some process. Acknowledge and move on to processing of the next synonym complement pair. Pairs that have not been determined to be synonymous until the end are not recognized as synonyms. When all the records have been processed, the synonym complement pairs recognized as synonyms are output as a synonym pair list.

まず、同義語ペアリスト8内のレコードID1(「アップル社」,「EVA」)から処理を開始する。   First, processing is started from record ID 1 (“Apple”, “EVA”) in the synonym pair list 8.

表記類似判定部21では同義語ペアリスト8のレコードID1に対応する侯補ID1,ID2から、解析処理結果テーブル7を参照すると、表記正規化カラムの表記は「アップル」,「EVA」となっており、異なるため表記類似判定部21での処理を終了し、読み類似判定部22へと進む。   When the notation similarity determination unit 21 refers to the analysis processing result table 7 from the supplementary IDs 1 and 2 corresponding to the record ID 1 of the synonym pair list 8, the notation normalization column notation is “Apple” and “EVA”. Therefore, the processing in the notation similarity determination unit 21 is terminated, and the process proceeds to the reading similarity determination unit 22.

読み類似判定部22では同義語ペアリスト8のレコードID1に対応する侯補ID1,ID2から処理結果テーブル7の読みを参照し、判定を行う。この同義語侯補ペアの読みはそれぞれ、解析結果カラムの読み:「アップルシャ」,「エバ」、読み正規化カラムの読み:「アプルシャ」,エバ」、表記+読み正規化カラムの読み:「アプル」,「エバ」の3つとなる。それぞれについて以下の処理を3回繰り返し行う。   The reading similarity determination unit 22 refers to the reading of the processing result table 7 from the supplementary IDs 1 and 2 corresponding to the record ID 1 of the synonym pair list 8 and performs the determination. The reading of this synonym complement pair is the reading of the analysis result column: “Applesha”, “Eva”, the reading normalization column: “Aprsha”, Eve ”, the notation + reading normalization column reading:“ It will be “apple” and “eva”. The following process is repeated three times for each.

まず、解析結果カラムの読み:「アップルシャ」,「エバ」について処理を行う。   First, the analysis result column readings: “Applesha” and “Eva” are processed.

(ステップs22−1)同義語侯補ペアの読みが異なるためステップs22−2へ進む。   (Step s22-1) Since the reading of the synonym complement pair is different, the process proceeds to Step s22-2.

(ステップs22−2)読みの音節数をカウントすると「アップルシャ」は4音節、「エバ」は2音節で異なるため、次の繰り返し処理へ移る。   (Step s22-2) When the number of syllables to be read is counted, “Applesha” is different in four syllables and “Eva” is different in two syllables.

ステップs22−1へ戻り、「アプルシャ」,「エバ」、「アプル」,「エバ」と処理を繰り返すが、両者とも音節数が異なり、同義とならないため、省略判定部23へ進む。   The process returns to step s22-1 and repeats the processing “Apulcia”, “Eve”, “Apple”, and “Eve”, but the number of syllables differs from each other and is not synonymous.

省略判定部23で同義語ペアリスト8のレコードID1のペアについて次に挙げる4通りの情報について、解析処理テーブル7を参照して判定を行う。解析結果カラムの表記:「アップル社」,「EVA」、表記正規化カラムの表記:「アップル」,「EVA」、読み正規化後カラムの読み:「アプルシャ」,「エバ」、表記+読み正規化カラムの読み:「アプル」,「エバ」の4パターンである。それぞれについて以下の処理を繰り返し行う。   The omission determination unit 23 determines the following four types of information regarding the pair of record ID 1 in the synonym pair list 8 with reference to the analysis processing table 7. Analysis result column notation: “Apple”, “EVA”, notation normalization column notation: “Apple”, “EVA”, reading normalization column reading: “Aprsha”, “Eva”, notation + reading normalization Reading of the conversion column: There are 4 patterns of “apple” and “eva”. The following processing is repeated for each.

まず、解析結果カラムの表記:「アップル社」,「EVA」から処理を開始する。   First, the processing starts from the notation of the analysis result column: “Apple”, “EVA”.

(ステップs23−1)表記が包含関係にないため、次の繰り返し処理に移る。続いて「アップル」,「EVA」、「アプルシャ」,「エバ」、「アプル」,「エバ」の処理を行っていくが、全て包含関係に無いため省略判定部23での処理を終了する。   (Step s23-1) Since the notation is not in an inclusive relationship, the process proceeds to the next iteration. Subsequently, the processing of “Apple”, “EVA”, “Apulcia”, “Eva”, “Apple”, “Eva” is performed, but since all are not in an inclusion relationship, the processing in the omission determination unit 23 is terminated.

同義語ペアリスト8のレコードID1は同義性判定部2の処理中に1度も同義と判定されなかったため、同義語ペアと認定せず次のレコード、即ちID2の処理に移る。   Since the record ID 1 in the synonym pair list 8 has never been determined to be synonymous during the process of the synonym determination unit 2, the record ID 1 is not recognized as a synonym pair and the process proceeds to the next record, that is, ID2.

ここで、レコードID2(「アップル社」,「八景島シーパラダイス」)、レコードID3(「アップル社」,「翔子」)もレコードID1と同様に同義とならないので、以下、図5のリスト8内で最終的に同義と判定されるレコードID7,ID22,ID45,ID63,ID78,ID90,ID121のうち、代表的なパターンであるID7,ID22,ID45,ID63,ID121に絞って同義性判定部2での処理を説明する。   Here, the record ID 2 (“Apple Inc.”, “Hakkeijima Sea Paradise”) and the record ID 3 (“Apple Inc.”, “Shoko”) are not synonymous with the record ID 1, so in the list 8 of FIG. Of the record ID7, ID22, ID45, ID63, ID78, ID90, and ID121 that are finally determined to be synonymous, the synonymity determination unit 2 narrows down to representative patterns ID7, ID22, ID45, ID63, and ID121. Processing will be described.

★レコードID7(「アップル社」,「アップル」)
[表記類似判定部21]
(ステップs21−1)同義語候補ペアリスト8の侯補ID1,ID8から解析処理結果テーブル7を参照すると、表記正規化カラムの表記同士は「アップル」,「アップル」となっており、全く同じなためステップs21−2へ進む。
★ Record ID 7 ("Apple company", "Apple")
[Notation similarity determination unit 21]
(Step s21-1) When the analysis processing result table 7 is referred to from the supplementary IDs 1 and 8 of the synonym candidate pair list 8, the notation of the notation normalization column is “Apple” and “Apple”, which are exactly the same. Therefore, the process proceeds to step s21-2.

(ステップs21−2)同義と判定し、同義性判定部2での処理を終了する。   (Step s21-2) It is determined to be synonymous, and the process in the synonymity determining unit 2 is terminated.

★レコードID22(「EVA」,「エヴァ」)
[表記類似判定部21]
(ステップs21−1)同義語侯補ペアリスト8の侯補ID2,ID9から解析処理結果テーブル7を参照すると、表記正規化カラムの表記同士は「EVA」,「エヴァ」で異なるため表記類似判定部21の処理を終了し、読み類似判定部22へと進む。
★ Record ID 22 ("EVA", "EVA")
[Notation similarity determination unit 21]
(Step s21-1) When the analysis processing result table 7 is referred to from the complement IDs 2 and 9 of the synonym complement pair list 8, the notation similarity determination is made because the notation of the notation normalization column differs between “EVA” and “EVA”. The process of the unit 21 is terminated, and the process proceeds to the reading similarity determination unit 22.

[読み類似判定部22]
同義語侯補ペアリスト8の侯補ID2,ID9から解析処理結果テーブル7を参照して
・解析結果カラムの読み:「エバ」,「エヴァ」
・読み正規化カラムの読み:「エバ」,「エヴァ」
・表記+読み正規化カラムの読み:「エバ」,「エヴァ」
を求め、繰り返し処理を行う。
[Reading similarity determination unit 22]
Refer to the analysis processing result table 7 from the complements ID2 and ID9 of the synonym supplementary pair list 8. Reading the analysis result column: “Eva”, “Eva”
・ Reading of normalization column: “Eva”, “Eva”
・ Notation + Reading normalization column reading: "Eva", "Eva"
And repeat the process.

まず、解析結果カラムの読み:「エバ」,「エヴァ」について処理を行う。   First, the analysis result column is read: “eva” and “eva” are processed.

(ステップs22−1)読みが異なるためステップs22−2へ進む。   (Step s22-1) Since the reading is different, the process proceeds to Step s22-2.

(ステップs22−2)読みの音節数をカウントし、両者とも2音節なためステップs22−3へ進む。   (Step s22-2) The number of syllables to be read is counted, and since both are two syllables, the process proceeds to Step s22-3.

(ステップs22−3)音節位置が同じで読みが異なる音節は、「バ」と「ヴァ」で、音節類似度テーブル4(図13(b):詳細は後述)から距離が0.87と求まる。この同義語侯補ペアには音節位置が同じで読みが異なるペアは1つしかないため、ペア間の距離は0.87となる。ステップs22−4へ進む。   (Step s22-3) The syllables having the same syllable position but different readings are “B” and “V”, and the distance is obtained as 0.87 from the syllable similarity table 4 (FIG. 13B: details will be described later). . Since this synonym complement pair has only one pair with the same syllable position and different readings, the distance between the pairs is 0.87. Proceed to step s22-4.

(ステップs22−4)距離の総和は0.87で予め設定した閾値0.9より小さいためステップs22−5へ進む。   (Step s22-4) Since the total sum of the distances is 0.87 and is smaller than the preset threshold value 0.9, the process proceeds to Step s22-5.

(ステップs22−5)同義と判定し、同義性判定部2での処理を終了する。   (Step s22-5) The synonymity determination unit 2 ends the process of determining synonymity.

★レコードID45(「翔子」,「ショコタン」)
[表記類似判定部21]
(ステップs21−1)同義語候補ペアリスト8の侯補ID3,ID10から解析処理結果テーブル7を参照すると、表記正規化カラムの表記同士は「翔子」,「ショコ」で異なるため表記類似判定部21の処理を終了し、読み類似判定部22へと進む。
★ Record ID 45 (“Shoko”, “Shokotan”)
[Notation similarity determination unit 21]
(Step s21-1) When the analysis processing result table 7 is referred to from the supplementary IDs 3 and 10 of the synonym candidate pair list 8, the notation similarity determination unit is different because the notation of the notation normalization column differs between “Shoko” and “Shoko”. The process of No. 21 is terminated, and the process proceeds to the reading similarity determination unit 22.

[読み類似判定部22]
同義語候補ペアリスト8の侯補ID3,ID10から解析処理結果テーブル7を参照して
・解析結果カラムの読み:「ショウコ」,「ショコタン」
・読み正規化カラムの読み:「ショコ」,「ショコタン」
・表記+読み正規化カラムの読み:「ショコ」,「ショコ」
を求め、繰り返し処理を行う。
[Reading similarity determination unit 22]
Refer to the analysis processing result table 7 from the supplementary IDs 3 and 10 of the synonym candidate pair list 8. Reading of the analysis result column: “shoko”, “shokotan”
・ Reading of reading normalization column: “Shoko”, “Shokotan”
・ Notation + Reading normalization column readings: “Shoko”, “Shoko”
And repeat the process.

まず、解析結果カラムの読み:「ショウコ」,「ショコタン」について処理を行う。   First, the analysis result column is read: “shoko” and “shokotan” are processed.

(ステップs22−1)読みが異なるためステップs22−2へ進む。   (Step s22-1) Since the reading is different, the process proceeds to Step s22-2.

(ステップs22−2)読みの音節数をカウントし、「ショウコ」は3音節、「ショコタン」は4音節で異なるため次の繰り返し処理へ移る。   (Step s22-2) The number of syllables to be read is counted, and “Shoko” is different in 3 syllables, and “Shochotan” is different in 4 syllables.

次に読み正規化カラムの読み:「ショコ」,「ショコタン」について処理を行う。   Next, the reading normalization column readings: “chocolate” and “chocotan” are processed.

(ステップs22−1)読みが異なるためステップs22−2へ進む。   (Step s22-1) Since the reading is different, the process proceeds to Step s22-2.

(ステップs22−2)読みの音節数をカウントし、「ショコ」は2音節、「ショコタン」は4音節で異なるため次の繰り返し処理へ移る。   (Step s22-2) The number of reading syllables is counted. Since “chocolate” is different in two syllables and “chocotan” is different in four syllables, the process proceeds to the next repetition process.

次に表記+読み正規化カラムの読み:「ショコ」,「ショコ」について処理を行う。   Next, the notation + reading normalization column readings: “chocolate” and “chocolate” are processed.

(ステップs22−1)読みが等しいため、ステップs22−5へ進む。   (Step s22-1) Since readings are equal, the flow proceeds to Step s22-5.

(ステップs22−5)同義と判定し、同義性判定部2での処理を終了する。   (Step s22-5) The synonymity determination unit 2 ends the process of determining synonymity.

★レコードID63(「八景島シーパラダイス」,「シーパラ」)
[表記類似判定部21]
(ステップs21−1)同義語侯補ペアリスト8の侯補ID4,ID11から解析処理結果テーブル7を参照すると、表記正規化カラムの表記同士は「八景島シーパラダイス」,「シーパラ」で表記が異なるため表記類似判定部21の処理を終了し、読み類似判定部へと進む22。
★ Record ID 63 ("Hakkeijima Sea Paradise", "Sea Para")
[Notation similarity determination unit 21]
(Step s21-1) When referring to the analysis processing result table 7 from the complement IDs 4 and 11 of the synonym complement pair list 8, the notation of the notation normalization column is different between “Hakkeijima Sea Paradise” and “Sea Para”. Therefore, the processing of the notation similarity determination unit 21 is terminated, and the process proceeds to the reading similarity determination unit 22.

[読み類似判定部22]
同義語侯補ペアリスト8の侯補ID4,ID11から解析処理結果テーブル7を参照して
・解析結果カラムの読み:「ハッケイジマシーパラダイス」,「シーパラ」
・読み正規化カラムの読み:「ハケイジマシパラダイス」,「シパラ」
・表記+読み正規化カラムの読み:「ハケイジマシパラダイス」,「シパラ」
を求め、繰り返し処理を行う。
[Reading similarity determination unit 22]
Refer to the analysis processing result table 7 from the complement IDs 4 and 11 of the synonym compensation pair list 8. Reading of the analysis result column: “Hackage Masi Paradise”, “Sea Para”
-Reading normalization column readings: "Hakeijima paradise", "Shipara"
・ Notation + Reading normalization column readings: “Hakeijimashi Paradise”, “Shipara”
And repeat the process.

この3パターンのそれぞれについて前記と同様に処理を繰り返した結果、どれもペア間で音節数が異なり、同義と判定されないため、省略判定部23へ処理を進める。   As a result of repeating the processing in the same manner as described above for each of the three patterns, the number of syllables differs between the pairs and is not determined to be synonymous, so the processing proceeds to the omission determination unit 23.

[省略判定部23]
同義語候補ペアリスト8の侯補ID3,ID10から解析処理結果テーブル7を参照して
・解析結果カラムの表記同士:「八景島シーパラダイス」,「シーパラ」
・表記正規化カラムの表記同士:「八景島シーパラダイス」,「シーパラ」
・読み正規化後カラムの読み同士:「ハケイジマシパラダイス」,「シパラ」
・表記+読み正規化カラムの読み同士:「ハケイジマシパラダイス」,「シパラ」
を求める。
[Omission determination unit 23]
Refer to the analysis processing result table 7 from the supplementary IDs 3 and 10 of the synonym candidate pair list 8 ・ Analysis result column notation: “Hakkeijima Sea Paradise”, “Sea Para”
・ Notation of notation normalization column: "Hakkeijima Sea Paradise", "Sea Para"
-Readings of columns after reading normalization: "Hakeijimashi Paradise", "Shipara"
・ Notation + Reading normalization column readings: "Hakeijima paradise", "Shipara"
Ask for.

この4パターンのそれぞれについて以下の処理を繰り返し行う。   The following processing is repeated for each of the four patterns.

まず、解析結果カラムの表記同士:「八景島シーパラダイス」,「シーパラ」の処理を行う。   First, the notation of analysis result columns: “Hakkeijima Sea Paradise” and “Sea Para” are processed.

(ステップs23−1)表記が包含関係にあるためステップs23−2へ進む。   (Step s23-1) Since the notation is in an inclusive relationship, the process proceeds to Step s23-2.

(ステップs23−2)DPマッチング法により位置合わせを行うと、図16の左上に示すようになる。ステップs23−3へ進む。   (Step s23-2) When alignment is performed by the DP matching method, the result is as shown in the upper left of FIG. Proceed to step s23-3.

(ステップs23−3)素性の抽出を図16のステップs23−3に示すように行う。削除前の表現が「八景島シーパラダイス」、削除後の表現が「シーパラ」で、削除後残った形態素は「シー」、削除された形態素は「八景島」、削除された文字は「ダ」,「イ」,「ス」、残った文字は「パ」,「ラ」である。この6つについて、それぞれ処理結果テーブル7を参照しながら図16の右側に示したように形態素数,文字数,品詞等の素性の抽出を行う。   (Step s23-3) Feature extraction is performed as shown in step s23-3 in FIG. The expression before deletion is “Hakkeijima Sea Paradise”, the expression after deletion is “Seapara”, the remaining morpheme is “Sea”, the deleted morpheme is “Hakkeijima”, the deleted characters are “Da”, “ “I” and “su” and the remaining characters are “pa” and “la”. With respect to the six, the features such as the number of morphemes, the number of characters, the part of speech, etc. are extracted as shown on the right side of FIG.

(ステップs23−4)分類器のモデルである略判定モデル6を用いて同義義語侯補ペアが省略語関係にあるかを判定した結果、同義となるためステップs23−5へ進む。   (Step s23-4) As a result of determining whether or not the synonym complement pair is in an abbreviation relationship using the approximate determination model 6 that is a classifier model, the process proceeds to step s23-5 because it is synonymous.

(ステップs23−5)同義と判定し、同義性判定部2での処理を終了する。   (Step s <b> 23-5) The synonym determination unit 2 terminates the process of determining synonymity.

★レコードID121(「安藤美姫」,「ミキティ」)
[表記類似判定部21]
(ステップs21−1)同義語候補ペアリスト8の侯補ID7,ID14から解析処理結果テーブル7を参照すると、表記正規化カラムの表記同士は「安藤美姫」,「ミキ」で表記同士が異なるため表記類似判定部21の処理を終了し、読み類似判定部22へと進む。
★ Record ID 121 ("Miki Ando", "Mikiti")
[Notation similarity determination unit 21]
(Step s21-1) When the analysis processing result table 7 is referred to from the supplementary IDs 7 and 14 in the synonym candidate pair list 8, the notation of the notation normalization column is “Miki Ando” and “Miki”, and the notations are different from each other. The process of the notation similarity determination unit 21 ends, and the process proceeds to the reading similarity determination unit 22.

[読み類似判定部22]
同義語候補ペアリスト8の侯補ID7,ID14から解析処理結果テーブル7を参照して
・解析結果カラムの読み:「アンドウミキ」,「ミキティ」
・読み正規化カラムの読み:「アンドウミキ」,「ミキティ」
・表記+読み正規化カラムの読み:「アンドウミキ」,「ミキ」
を求め、繰り返し処理を行う。
[Reading similarity determination unit 22]
Refer to the analysis processing result table 7 from the supplementary IDs 7 and 14 of the synonym candidate pair list 8. Reading of the analysis result column: “Ando Miki”, “Mikiti”
-Reading normalization column readings: "Andumiki", "Mikiti"
・ Notation + Reading normalization column reading: “Ando Miki”, “Miki”
And repeat the process.

この3パターンのそれぞれについて前記と同様に処理を繰り返した結果、どれもペア間で音節数が異なり、同義と判定されないため、省略判定部23へ処理を進める。   As a result of repeating the processing in the same manner as described above for each of the three patterns, the number of syllables differs between the pairs and is not determined to be synonymous, so the processing proceeds to the omission determination unit 23.

[省略判定部23]
同義語候補ペアリスト8の侯補ID7,ID14から解析処理結果テーブル7を参照して
・解析結果カラムの表記同士:「安藤美姫」,「ミキティ」
・表記正規化カラムの表記同士:「安藤美姫」,「ミキ」
・読み正規化後カラムの読み同士:「アンドウミキ」,「ミキティ」
・表記+読み正規化カラムの読み同士:「アンドウミキ」,「ミキ」
を求める。
[Omission determination unit 23]
Refer to the analysis processing result table 7 from the supplementary IDs 7 and 14 of the synonym candidate pair list 8. The notation of the analysis result column: “Miki Ando”, “Mikiti”
・ Notation of notation normalization columns: “Miki Ando”, “Miki”
-Readings of columns after reading normalization: "Andumiki", "Mikity"
・ Notation + Reading normalization column readings: "Ando Miki", "Miki"
Ask for.

この4パターンそれぞれについて以下の処理を繰り返し行う。   The following processing is repeated for each of the four patterns.

まず、解析結果カラムの表記同士:「安藤美姫」,「ミキティ」の処理を行う。   First, the notation of the analysis result column: “Miki Ando” and “Mikiti” are processed.

(ステップs23−1)表記が包含関係にないため、次の繰り返し処理へ進む。   (Step s23-1) Since the notation is not in an inclusive relationship, the process proceeds to the next iterative process.

次の「安藤美姫」,「ミキ」、「アンドウミキ」,「ミキティ」の処理を順次行うが、両者とも包含関係にないため、同義と判定されない。最後の「アンドウミキ」,「ミキ」の処理は以下のようになる。   The following “Miki Ando”, “Miki”, “Ando Miki”, and “Mikiti” are sequentially processed, but since they are not in an inclusive relationship, they are not determined to be synonymous. The final “Andumiki” and “Miki” processes are as follows.

(ステップs23−1)読みが包含関係にあるためステップs23−2へ進む。   (Step s23-1) Since reading is in an inclusive relationship, the process proceeds to Step s23-2.

以後、(ステップs23−2)〜(ステップs23−5)はレコードID63(「八景島シーパラダイス」,「シーパラ」)の例の場合と同様に行う。   Thereafter, (step s23-2) to (step s23-5) are performed in the same manner as in the case of the record ID 63 (“Hakkeijima Sea Paradise”, “Sea Para”).

以上のようにして同義侯補ペアリスト8内の同義語侯補全ての同義性判定を終えた後、同義と判定されたレコードID7,ID22,ID45,ID63,ID78,ID90,ID121を同義語ペアとして出力する。   After completing the synonym determination of all the synonym complements in the synonym complement pair list 8 as described above, the record ID7, ID22, ID45, ID63, ID78, ID90, ID121 determined to be synonymous are synonymous pairs. Output as.

[II]逆変換ルール3及び音節正規化ルール4の詳細な適用例、音節類似度テーブル5及び省略判定モデル6の作成例
逆変換ルール3については、本実施例では図11に挙げた接頭辞の削除、接尾辞の削除、読み仮名の削除、繰り返し表現の削除、省略記号の削除の4つについて説明する。
[II] Detailed application example of reverse conversion rule 3 and syllable normalization rule 4, example of creation of syllable similarity table 5 and omission determination model 6 With respect to reverse conversion rule 3, the prefixes shown in FIG. , Deletion of suffix, deletion of reading kana, deletion of repeated expression, and deletion of ellipsis.

接頭辞の削除では、例えば「表記:お/吉,読み:オ/キチ」という対象に適用すると接頭辞の「お」を削除して「表記:吉,読み:キチ」となる。   When deleting a prefix, for example, when applied to a target “notation: o / kichi, reading: o / kichi”, the prefix “o” is deleted and becomes “notation: kichi, reading: kichi”.

接尾辞の削除では、例えば「表記:アップル/社,読み:アップル/シャ」という対象に適用すると、接尾辞の「社」を削除して「表記:アップル,読み:アップル」となる。   In the deletion of the suffix, for example, when applied to a target “notation: Apple / Company, reading: Apple / Sha”, the suffix “Company” is deleted to become “notation: Apple, reading: Apple”.

読み仮名の削除では、例えば「表記:安/(/やす/)/めぐみ,読み:ヤス/ヤス/メグミ」という対象に適用すると、「表記:安/めぐみ,読み:ヤス/メグミ」となる。   In the case of deletion of a reading kana, for example, when applied to a target of “notation: cheap / (/ easy /) / Megumi, reading: Yasu / Yasu / Megumi”, it becomes “notation: Yasu / Megumi, reading: Yasu / Megumi”.

繰り返し表現の削除では、例えば「表記:キョンキョン,読み:キョンキョン」という対象に適用すると、「表記:キョン,読み:キョン」となる。   When deleting repeated expressions, for example, when applied to an object “notation: Kyung Kyung, reading: Kyung Kyung”, it becomes “notation: Kyung, reading: Kyung”.

省略記号の削除では、「表記:ハリーポッター3/ /炎/の/−,読み:ハリーポッターサン/ /ホノオ/ノ」という対象に適用すると,「表記:ハリーポッター3/ /炎/の,読み:ハリーポッターサン/ /ホノオ/ノ」となる。   When the ellipsis is deleted, it is applied to the object "notation: Harry Potter 3 // Fire / No /-, reading: Harry Potter Sun // Hoono / No", and "Notation: Harry Potter 3 // Fire / : Harry Potter Sun // Hoono / No ”.

音節正規化ルール4については,本実施例では図12に挙げた「ユウコ」、「ウィンブルドン」、「イェルサレム」、「スウィング」の4例について説明する。   Regarding the syllable normalization rule 4, four examples of “Yuko”, “Wimbledon”, “Jerusalem”, and “Swing” shown in FIG. 12 will be described in this embodiment.

★「ユウコ」
「ユ」と「ウ」で同じ母音が連続するためルール1が適用され、母音連続部分が長音化して「ユーコ」となる。次にルール1により長音となった部分に対してルール5が適用され「ユコ」となる。
★ "Yuko"
Since the same vowel continues in “Yu” and “U”, Rule 1 is applied, and the continuous vowel part becomes longer and becomes “Yuko”. Next, rule 5 is applied to the part that has become a long sound according to rule 1, resulting in “Yuko”.

★「ウインブルドン」
母音「ウ」と別種の母音の拗音「ィ」が連続するためルール2が適用され、「ウインブルドン」となる。
★ "Wimbledon"
Since the vowel “U” and the roaring “i” of another type of vowel are continuous, Rule 2 is applied, resulting in “Wimbledon”.

★「イェルサレム」
母音「イ」と別種の母音の拗音「ェ」が連続するためルール2が適用され、「イエルサレム」となる。さらに、ルール2に当てはまった母音と母音の拗音がルール3の条件と一致するため、母音「イ」が削除され、「エルサレム」となる。
★ "Jerusalem"
Since the vowel “I” and the roaring “e” of another kind of vowel are continuous, Rule 2 is applied, resulting in “Jerusalem”. Furthermore, since the vowels that meet rule 2 and the stuttering of the vowels match the conditions of rule 3, the vowel “i” is deleted and becomes “Jerusalem”.

★「スウイング」
母音「ウ」と別種の母音の拗音「ィ」が連続するためルール2が適用され、「スウイング」となる。さらに、ルール2を適用した結果、母音が「ウウイ」と連続することになり、ルール4の条件に当てはまるため、連続する同種の母音「ウ」を1つ削除して「スイング」となる。
★ "Swing"
Since the vowel “U” and the roar “i” of another kind of vowel are continuous, Rule 2 is applied, resulting in “Swing”. Further, as a result of applying the rule 2, the vowels are continuous with “Uui”, and the condition of the rule 4 is satisfied. Therefore, one continuous vowel “U” of the same type is deleted and becomes “swing”.

音節類似度テーブル5の作成方法を、図13に挙げた「アーティスト」とその異表記を用いた例で説明する。形態素解析辞書は、異表記・表記ゆれに対応しており、その読み及び標準表記の情報を備えたものを用いる。まず、形態素解析辞書から、標準表記が同じで発音が異なる単語「アーティスト」,「アーテスト」,「アーチスト」を集める。次に、それぞれの表現に対して音節正規化ルール4を適用し、音節数が一致したもので位置合わせを行う。その結果、「アティスト」,「アテスト」,「アチスト」となり、音節位置が同じで読みが異なる3つのペア「テとティ」,「チとティ」,「テとチ」が求まる。同様にして形態素解析辞書から標準表記をキーに異表記の収集、正規化、読みの異なるペアを収集する。辞書全部の処理が終わった時点で、音節位置が同じで読みが異なるペアと、出現する音節の数を、それぞれ種類ごとにカウントする。そして図13の式を用いて距離を計算することで、テーブルを作成する。   A method of creating the syllable similarity table 5 will be described with an example using “artist” and its different notation shown in FIG. The morphological analysis dictionary supports different notations and notation fluctuations, and uses a dictionary having information on the reading and standard notation. First, the words “Artist”, “Artist”, and “Artist” with the same standard notation and different pronunciation are collected from the morphological analysis dictionary. Next, the syllable normalization rule 4 is applied to each expression, and alignment is performed with the syllable number matching. As a result, three pairs “te and tee”, “chi and ti”, “te and ti”, and “te and ti” having the same syllable position and different readings are obtained as “Atist”, “Attest”, and “Atist”. Similarly, pairs of different notation collection, normalization, and reading are collected from the morphological analysis dictionary using the standard notation as a key. When all the dictionaries have been processed, the number of pairs with the same syllable position and different readings and the number of syllables that appear are counted for each type. Then, a table is created by calculating the distance using the formula of FIG.

省略判定モデル6の作成方法として、識別関数にSVMを用いた例を説明する。学習に利用するテキストデータは、出来れば実運用時に用いる入力テキストと同じドメインから取得すること、実運用時に用いるのと同じ解析器で解析処理を行うことが望ましい。学習用のテキストに対して本実施例1と同じ方法で同義候補表現ペアを作成した後、表記が包含関係になっているものだけを取り出す。そしてそれぞれのエントリについて人手で同義か否かの正解付けを行う。そして省略判定部23と同様の素性を抽出し、識別関数のパラメータを学習することにより、省略判定モデルを作成する。また、このとき同義語候補ペア生成部1で作成した同義語候補表現ペアの、読みカラムや表記正規化カラム、読み正規化カラム、表記+読み正規化カラムのデータを用いて学習すれば、それぞれのカラムに対応した省略判定モデルを作成できる。   As a method for creating the omission determination model 6, an example in which SVM is used as an identification function will be described. If possible, it is desirable that the text data used for learning is acquired from the same domain as the input text used during actual operation, and the analysis processing is performed using the same analyzer as used during actual operation. After the synonym candidate expression pair is created for the learning text by the same method as in the first embodiment, only those whose notation is in an inclusive relationship are extracted. Each entry is correctly answered whether it is synonymous or not. Then, the same features as the omission determination unit 23 are extracted, and the omission determination model is created by learning the parameters of the discrimination function. In this case, if the synonym candidate expression pair created by the synonym candidate pair generation unit 1 is learned using the data of the reading column, the notation normalization column, the reading normalization column, and the notation + reading normalization column, An omission determination model corresponding to the column of can be created.

なお、実施の形態における逆変換ルール記憶部、音節正規化ルール記憶部、音節類似度テーブル記憶部、省略判定モデル記憶部、解析処理結果テーブル記憶部、同義語候補ペアリスト記憶部、という記載は、どのようなデータを記憶するかという機能上の違いに基づく表現であり、ハードウェア的に個別の記憶部(記憶装置)が必要であるという意味ではない。また、本発明は、周知のコンピュータに媒体もしくは通信回線を介して、図1、図2、図6の構成図に示された機能を実現するプログラムあるいは図4、図7乃至図10の流れ図に示された手順を備えるプログラムをインストールすることによっても実現可能である。   Note that the description of the inverse conversion rule storage unit, syllable normalization rule storage unit, syllable similarity table storage unit, omission determination model storage unit, analysis processing result table storage unit, synonym candidate pair list storage unit in the embodiment This is an expression based on the functional difference of what kind of data is stored, and does not mean that an individual storage unit (storage device) is necessary in terms of hardware. In addition, the present invention can be applied to a program for realizing the functions shown in the configuration diagrams of FIGS. 1, 2, and 6 or the flowcharts of FIGS. 4, 7 to 10 via a medium or a communication line in a known computer. It can also be realized by installing a program having the indicated procedure.

本発明の同義性判定装置の実施の形態の一例を示す概略ブロック図The schematic block diagram which shows an example of embodiment of the synonym determination apparatus of this invention 同義語侯補生成部の詳細を示すブロック図Block diagram showing details of synonym complement generation unit 解析処理結果テーブルの一例を示す説明図Explanatory drawing which shows an example of an analysis process result table 正規化処理部における処理の流れ図Process flow in normalization processing unit 同義語侯補ペアリストの一例を示す説明図Explanatory drawing which shows an example of a synonym complement pair list 同義性判定部の詳細を示すブロック図Block diagram showing details of synonymity determination unit 同義性判定部における処理の流れ図Flow chart of processing in synonymity determination unit 表記類似判定部における処理の流れ図Flow chart of processing in the notation similarity determination unit 読み類似判定部における処理の流れ図Flow chart of processing in the reading similarity determination unit 省略判定部における処理の流れ図Flow chart of processing in the omission determination unit 逆変換ルールの一例を示す説明図Explanatory drawing which shows an example of a reverse conversion rule 音節正規化ルールの一例を示す説明図Explanatory drawing which shows an example of a syllable normalization rule 音節類似度テーブルの作成手順及びその一例を示す説明図Explanatory drawing which shows the preparation procedure of the syllable similarity table and an example thereof 入力テキストの一例を示す説明図Explanatory drawing showing an example of input text 解析処理終了時点の解析処理結果テーブルの一例を示す説明図Explanatory drawing which shows an example of the analysis process result table at the time of an analysis process end 省略判定部における素性抽出処理の一例を示す説明図Explanatory drawing which shows an example of the feature extraction process in an omission determination part

符号の説明Explanation of symbols

1:同義語候補ペア生成部、2:同義性判定部、3:逆変換ルール記憶部、4:音節正規化ルール記憶部、5:音節類似度テーブル記憶部、6:省略判定モデル記憶部、7:解析処理結果テーブル記憶部、8:同義語候補ペアリスト記憶部、11:解析処理部、12:正規化処理部、13:ペア生成部、21:表記類似判定部、22:読み類似判定部、23:省略判定部。   1: synonym candidate pair generation unit, 2: synonymity determination unit, 3: inverse conversion rule storage unit, 4: syllable normalization rule storage unit, 5: syllable similarity table storage unit, 6: omission determination model storage unit, 7: analysis processing result table storage unit, 8: synonym candidate pair list storage unit, 11: analysis processing unit, 12: normalization processing unit, 13: pair generation unit, 21: notation similarity determination unit, 22: reading similarity determination Part, 23: omission determination part.

Claims (10)

テキストから同義語候補としての文字列表現である同義語侯補表現を抽出して同義語侯補ペアを生成し、当該同義語侯補ペア中の同義語侯補表現同士が同義か否かを判定する同義性判定装置であって、
同義語侯補表現の表記を正規化するための逆変換ルールを記憶する逆変換ルール記憶部と、
同義語侯補表現の読みを正規化するための音節正規化ルールを記憶する音節正規化ルール記憶部と、
同義語侯補ペア中の同義語侯補表現同士の読みの類似度を求めるための音節類似度テーブルを記憶する音節類似度テーブル記憶部と、
同義語侯補ペア中の同義語侯補表現同士が省略語関係にあるか否かを判定するための省略判定モデルを記憶する省略判定モデル記憶部と、
入力されたテキストを解析処理し、その解析結果に基づいて前記テキストから同義語侯補表現を抽出するとともに対応する解析結果を付与し、逆変換ルール記憶部に記憶された逆変換ルール並びに音節正規化ルール記憶部に記憶された音節正規化ルールを用いて前記同義語侯補表現の表記及び読みの正規化を行った後、同義語侯補表現同士を組み合わせて一対の同義語侯補表現よりなる同義語侯補ペアを生成する同義語侯補ペア生成手段と、
音節類似度テーブル記憶部に記憶された音節類似度テーブル及び省略判定モデル記憶部に記憶された省略判定モデルを用いて前記同義語侯補ペア中の同義語侯補表現同士が同義か否かを判定し、同義であれば当該同義語侯補ペアを同義語ペアとして出力する同義性判定手段と、を備えた
ことを特徴とする同義性判定装置。
A synonym complement expression that is a string expression as a synonym candidate is extracted from the text to generate a synonym complement pair, and whether or not the synonym complement expressions in the synonym complement pair are synonymous with each other. A synonymity judging device for judging,
An inverse conversion rule storage unit for storing an inverse conversion rule for normalizing the notation of the synonym complement expression;
A syllable normalization rule storage unit for storing a syllable normalization rule for normalizing the reading of the synonym complement expression;
A syllable similarity table storage unit for storing a syllable similarity table for obtaining a similarity of reading between synonym complement expressions in the synonym complement pair;
An abbreviation determination model storage unit that stores an abbreviation determination model for determining whether or not the synonym complement expressions in the synonym complement pair are in an abbreviation relationship;
The input text is analyzed, the synonym complement expression is extracted from the text based on the analysis result, the corresponding analysis result is given, and the reverse conversion rule and syllable normalization stored in the reverse conversion rule storage unit After synthesizing the synonym complement expression using the syllable normalization rule stored in the generalization rule storage unit and normalizing the reading, the synonym complement expression is combined and a pair of synonym complement expressions A synonym complement pair generating means for generating the following synonym complement pair;
Whether the synonym complement expressions in the synonym complement pair are synonymous using the syllable similarity table stored in the syllable similarity table storage unit and the abbreviation determination model stored in the abbreviation determination model storage unit. Synonymity determining means for determining and outputting the synonym complement pair as a synonym pair if it is synonymous.
請求項1記載の同義性判定装置において、
名詞を利用する際に一般的に挿入される接辞形を削除するルール、愛称を作成する際に利用される繰り返し表現を削除するルールを少なくとも含む、表記を正規化し、当該表記の正規化に併せて読みを正規化する逆変換ルールを用い、
同義語侯補ペア生成手段は、同義語侯補表現の解析結果中の表記に前記逆変換ルールを適用して少なくとも接辞形と繰り返し表現を削除し、当該削除に併せて読みを訂正する正規化処理を行う
ことを特徴とする同義性判定装置。
The synonymity determination device according to claim 1,
Normalize the notation, including at least rules for deleting affix forms that are generally inserted when using nouns, rules for deleting repeated expressions used when creating nicknames, and combine normalization of the notation Using the inverse transformation rule to normalize readings
The synonym complement pair generation means applies at least the affix form and the repeated expression to the notation in the analysis result of the synonym complement expression, deletes at least the affix form and the repeated expression, and normalizes to correct the reading along with the deletion A synonymity determination device characterized by performing processing.
請求項1記載の同義性判定装置において、
読みの母音連続や長音、促音に対して適用することで和語と外来語とで異なる読みの長さの単位、口語表現、音訳時のゆれを少なくとも正規化する音節正規化ルールを用い、
同義語侯補ペア生成手段は、同義語侯補表現の解析結果中の読み及び逆変換ルールによって正規化された表記の読みの母音連続や長音、促音に前記音節正規化ルールを適用して正規化する正規化処理を行う
ことを特徴とする同義性判定装置。
The synonymity determination device according to claim 1,
Using syllable normalization rules that at least normalize the reading length unit, colloquial expression, and transliteration fluctuation between Japanese and foreign words by applying it to continuous vowels, long sounds, and prompting sounds,
The synonym complement pair generation means applies the syllable normalization rule to the vowel continuation, long sound, and prompting of the notation that is normalized by the reading and inverse conversion rules in the analysis result of the synonym complement expression. The synonymity determination apparatus characterized by performing the normalization process which makes it.
請求項1記載の同義性判定装置において、
「表記は異なるが読みが類似する音節ペア」をキーとし、「距離」を値とした、同義語侯補ペア中の同義語侯補表現同士の読みの類似度を求めるための音節類似度テーブルを用い、
同義語判定手段は、同義語候補ペア中の各同義語侯補表現の読みの、音節数が等しく且つ音節位置が同じで読みが異なる音節間の距離の総和を前記音節類似度テーブルを用いて求め、当該距離の総和が予め設定した閾値より小さければ同義と判定する
ことを特徴とする同義性判定装置。
The synonymity determination device according to claim 1,
A syllable similarity table for finding the similarity of readings between synonym complement expressions in a synonym complement pair, using "syllabic pairs with different notations but similar readings" as keys and "distance" as a value Use
The synonym determination means uses the syllable similarity table to calculate the sum of distances between syllables having the same number of syllables, the same syllable position, and different readings in the reading of each synonym complement expression in the synonym candidate pair. The synonymity determination device characterized in that the synonymity is determined if the sum of the distances is smaller than a preset threshold value.
テキストから同義語候補としての文字列表現である同義語侯補表現を抽出して同義語侯補ペアを生成し、当該同義語侯補ペア中の同義語侯補表現同士が同義か否かを判定する同義性判定方法であって、
同義語侯補表現の表記を正規化するための逆変換ルールを記憶する逆変換ルール記憶部と、
同義語侯補表現の読みを正規化するための音節正規化ルールを記憶する音節正規化ルール記憶部と、
同義語侯補ペア中の同義語侯補表現同士の読みの類似度を求めるための音節類似度テーブルを記憶する音節類似度テーブル記憶部と、
同義語侯補ペア中の同義語侯補表現同士が省略語関係にあるか否かを判定するための省略判定モデルを記憶する省略判定モデル記憶部とを用い、
同義語侯補ペア生成手段が、入力されたテキストを解析処理し、その解析結果に基づいて前記テキストから同義語侯補表現を抽出するとともに対応する解析結果を付与し、逆変換ルール記憶部に記憶された逆変換ルール並びに音節正規化ルール記憶部に記憶された音節正規化ルールを用いて前記同義語侯補表現の表記及び読みの正規化を行った後、同義語侯補表現同士を組み合わせて一対の同義語侯補表現よりなる同義語侯補ペアを生成する工程と、
同義性判定手段が、音節類似度テーブル記憶部に記憶された音節類似度テーブル及び省略判定モデル記憶部に記憶された省略判定モデルを用いて前記同義語侯補ペア中の同義語侯補表現同士が同義か否かを判定し、同義であれば当該同義語侯補ペアを同義語ペアとして出力する工程とを含む
ことを特徴とする同義性判定方法。
A synonym complement expression that is a string expression as a synonym candidate is extracted from the text to generate a synonym complement pair, and whether or not the synonym complement expressions in the synonym complement pair are synonymous with each other. A synonymity judging method for judging,
An inverse conversion rule storage unit for storing an inverse conversion rule for normalizing the notation of the synonym complement expression;
A syllable normalization rule storage unit for storing a syllable normalization rule for normalizing the reading of the synonym complement expression;
A syllable similarity table storage unit for storing a syllable similarity table for obtaining a similarity of reading between synonym complement expressions in the synonym complement pair;
Using an abbreviation determination model storage unit that stores an abbreviation determination model for determining whether or not the synonym complement expressions in the synonym complement pair are in an abbreviation relationship,
The synonym complement pair generating means analyzes the input text, extracts a synonym complement expression from the text based on the analysis result, and gives a corresponding analysis result, and stores it in the inverse conversion rule storage unit. The synonym complement expression is combined after normalizing the notation and reading of the synonym complement expression using the stored inverse conversion rule and the syllable normalization rule stored in the syllable normalization rule storage unit Generating a synonym complement pair consisting of a pair of synonym complement expressions;
The synonym determination means uses the syllable similarity table stored in the syllable similarity table storage unit and the abbreviation determination model stored in the abbreviation determination model storage unit to synonym complement expressions in the synonym complement pair A synonym determination method comprising: determining whether or not is synonymous and outputting the synonym complement pair as a synonym pair if synonymous.
請求項5記載の同義性判定方法において、
名詞を利用する際に一般的に挿入される接辞形を削除するルール、愛称を作成する際に利用される繰り返し表現を削除するルールを少なくとも含む、表記を正規化し、当該表記の正規化に併せて読みを正規化する逆変換ルールを用い、
同義語侯補ペア生成工程は、
同義語侯補表現の解析結果中の表記に前記逆変換ルールを適用して少なくとも接辞形と繰り返し表現を削除し、当該削除に併せて読みを訂正する正規化処理を行う工程を含む
ことを特徴とする同義性判定方法。
In the synonymity determination method according to claim 5,
Normalize the notation, including at least rules for deleting affix forms that are generally inserted when using nouns, rules for deleting repeated expressions used when creating nicknames, and combine normalization of the notation Using the inverse transformation rule to normalize readings
The synonym complement pair generation process is
Applying the inverse transformation rule to the notation in the analysis result of the synonym complement expression to delete at least the affix form and the repeated expression, and performing a normalization process to correct the reading in accordance with the deletion The synonymity judging method.
請求項5記載の同義性判定方法において、
読みの母音連続や長音、促音に対して適用することで和語と外来語とで異なる読みの長さの単位、口語表現、音訳時のゆれを少なくとも正規化する音節正規化ルールを用い、
同義語侯補ペア生成工程は、
同義語侯補表現の解析結果中の読み及び逆変換ルールによって正規化された表記の読みの母音連続や長音、促音に前記音節正規化ルールを適用して正規化する正規化処理を行う工程を含む
ことを特徴とする同義性判定方法。
In the synonymity determination method according to claim 5,
Using syllable normalization rules that at least normalize the reading length unit, colloquial expression, and transliteration fluctuation between Japanese and foreign words by applying it to continuous vowels, long sounds, and prompting sounds,
The synonym complement pair generation process is
A step of performing a normalization process of normalizing the syllable normalization rules by applying the syllable normalization rules to the vowel continuation, the long sound, and the prompt sound of the notation that is normalized by the reading and reverse conversion rules in the analysis result of the synonym complement expression A synonymity judging method characterized by including.
請求項5記載の同義性判定方法において、
「表記は異なるが読みが類似する音節ペア」をキーとし、「距離」を値とした、同義語侯補ペア中の同義語侯補表現同士の読みの類似度を求めるための音節類似度テーブルを用い、
同義語判定工程は、
同義語候補ペア中の各同義語侯補表現の読みの、音節位置が同じで読みが異なる音節間の距離の総和を前記音節類似度テーブルを用いて求め、当該距離の総和が予め設定した閾値より小さければ同義と判定する工程を含む
ことを特徴とする同義性判定方法。
In the synonymity determination method according to claim 5,
A syllable similarity table for finding the similarity of readings between synonym complement expressions in a synonym complement pair, with "syllabic pairs with different notations but similar readings" as keys and "distance" as a value Use
The synonym determination process
The sum of distances between syllables having the same syllable position but different readings in the reading of each synonym complement expression in the synonym candidate pair is obtained using the syllable similarity table, and the sum of the distances is a preset threshold. The synonymity determination method characterized by including the process of determining that it is synonymous if smaller.
コンピュータを、請求項1乃至4のいずれかに記載の同義性判定装置の各手段として機能させるためのプログラム。   The program for functioning a computer as each means of the synonym determination apparatus in any one of Claims 1 thru | or 4. 請求項9に記載のプログラムを記録したコンピュータ読み取り可能な記録媒体。   A computer-readable recording medium on which the program according to claim 9 is recorded.
JP2008065256A 2008-03-14 2008-03-14 Synonymity determination device, method, program, and recording medium Expired - Fee Related JP5094486B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2008065256A JP5094486B2 (en) 2008-03-14 2008-03-14 Synonymity determination device, method, program, and recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2008065256A JP5094486B2 (en) 2008-03-14 2008-03-14 Synonymity determination device, method, program, and recording medium

Publications (2)

Publication Number Publication Date
JP2009223463A true JP2009223463A (en) 2009-10-01
JP5094486B2 JP5094486B2 (en) 2012-12-12

Family

ID=41240199

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2008065256A Expired - Fee Related JP5094486B2 (en) 2008-03-14 2008-03-14 Synonymity determination device, method, program, and recording medium

Country Status (1)

Country Link
JP (1) JP5094486B2 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011107751A (en) * 2009-11-12 2011-06-02 Aisin Aw Co Ltd Spot search device and program
JP2011180862A (en) * 2010-03-02 2011-09-15 Nippon Telegr & Teleph Corp <Ntt> Method and device of extracting term, and program
JP2012108795A (en) * 2010-11-18 2012-06-07 Ntt Docomo Inc Synonym determination device, synonym determination method and program
JP2013016011A (en) * 2011-07-04 2013-01-24 Nippon Telegr & Teleph Corp <Ntt> Synonym dictionary generation device, method therefor, and program
JP2014006620A (en) * 2012-06-22 2014-01-16 Yahoo Japan Corp Synonym estimation device, synonym estimation method, and synonym estimation program
JP2014006621A (en) * 2012-06-22 2014-01-16 Yahoo Japan Corp Synonym estimation device, synonym estimation method, and synonym estimation program
JP2015106361A (en) * 2013-12-02 2015-06-08 株式会社日立製作所 Data retrieval system and data retrieval method
JP2016091344A (en) * 2014-11-06 2016-05-23 日本電気株式会社 Orthographically variant word determination apparatus, orthographically variant word determination method, orthographically variant word determination program, and document analysis device
KR101769035B1 (en) * 2016-03-28 2017-08-18 울산과학기술원 Korean text clustering system and method
JP2018010543A (en) * 2016-07-15 2018-01-18 株式会社トヨタマップマスター Notation fluctuation glossary creation device, retrieval system, methods thereof, computer program thereof and recording medium recording computer program thereof
JP2020135877A (en) * 2019-02-18 2020-08-31 ネイバー コーポレーションNAVER Corporation Method and system for automatic extraction of synonymous loan word using transliteration model
CN112395867A (en) * 2020-11-16 2021-02-23 中国平安人寿保险股份有限公司 Synonym mining method, synonym mining device, synonym mining storage medium and computer equipment
WO2022168208A1 (en) * 2021-02-03 2022-08-11 日本電気株式会社 Information processing device, conversion pattern determination method, entity matching method, learning method, conversion pattern determination program, entity matching program, and learning program

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05282293A (en) * 1992-03-31 1993-10-29 Matsushita Electric Ind Co Ltd Word processor
JPH10177575A (en) * 1996-10-15 1998-06-30 Ricoh Co Ltd Device and method for extracting word and phrase and information storing medium
JP2002269134A (en) * 2001-03-09 2002-09-20 Ricoh Co Ltd Method and device for processing character string and information retrieval system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05282293A (en) * 1992-03-31 1993-10-29 Matsushita Electric Ind Co Ltd Word processor
JPH10177575A (en) * 1996-10-15 1998-06-30 Ricoh Co Ltd Device and method for extracting word and phrase and information storing medium
JP2002269134A (en) * 2001-03-09 2002-09-20 Ricoh Co Ltd Method and device for processing character string and information retrieval system

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011107751A (en) * 2009-11-12 2011-06-02 Aisin Aw Co Ltd Spot search device and program
JP2011180862A (en) * 2010-03-02 2011-09-15 Nippon Telegr & Teleph Corp <Ntt> Method and device of extracting term, and program
JP2012108795A (en) * 2010-11-18 2012-06-07 Ntt Docomo Inc Synonym determination device, synonym determination method and program
JP2013016011A (en) * 2011-07-04 2013-01-24 Nippon Telegr & Teleph Corp <Ntt> Synonym dictionary generation device, method therefor, and program
JP2014006620A (en) * 2012-06-22 2014-01-16 Yahoo Japan Corp Synonym estimation device, synonym estimation method, and synonym estimation program
JP2014006621A (en) * 2012-06-22 2014-01-16 Yahoo Japan Corp Synonym estimation device, synonym estimation method, and synonym estimation program
JP2015106361A (en) * 2013-12-02 2015-06-08 株式会社日立製作所 Data retrieval system and data retrieval method
JP2016091344A (en) * 2014-11-06 2016-05-23 日本電気株式会社 Orthographically variant word determination apparatus, orthographically variant word determination method, orthographically variant word determination program, and document analysis device
KR101769035B1 (en) * 2016-03-28 2017-08-18 울산과학기술원 Korean text clustering system and method
JP2018010543A (en) * 2016-07-15 2018-01-18 株式会社トヨタマップマスター Notation fluctuation glossary creation device, retrieval system, methods thereof, computer program thereof and recording medium recording computer program thereof
JP2020135877A (en) * 2019-02-18 2020-08-31 ネイバー コーポレーションNAVER Corporation Method and system for automatic extraction of synonymous loan word using transliteration model
JP7014830B2 (en) 2019-02-18 2022-02-01 ネイバー コーポレーション Methods and systems for automatically extracting foreign synonyms using a transliteration model
CN112395867A (en) * 2020-11-16 2021-02-23 中国平安人寿保险股份有限公司 Synonym mining method, synonym mining device, synonym mining storage medium and computer equipment
CN112395867B (en) * 2020-11-16 2023-08-08 中国平安人寿保险股份有限公司 Synonym mining method and device, storage medium and computer equipment
WO2022168208A1 (en) * 2021-02-03 2022-08-11 日本電気株式会社 Information processing device, conversion pattern determination method, entity matching method, learning method, conversion pattern determination program, entity matching program, and learning program

Also Published As

Publication number Publication date
JP5094486B2 (en) 2012-12-12

Similar Documents

Publication Publication Date Title
JP5094486B2 (en) Synonymity determination device, method, program, and recording medium
CN108140019B (en) Language model generation device, language model generation method, and recording medium
TW448381B (en) Automatic segmentation of a text
JP3768205B2 (en) Morphological analyzer, morphological analysis method, and morphological analysis program
US9280967B2 (en) Apparatus and method for estimating utterance style of each sentence in documents, and non-transitory computer readable medium thereof
Pennell et al. Normalization of text messages for text-to-speech
JP2003514304A5 (en)
JP5524138B2 (en) Synonym dictionary generating apparatus, method and program thereof
Nicolai et al. Leveraging Inflection Tables for Stemming and Lemmatization.
JP6427466B2 (en) Synonym pair acquisition apparatus, method and program
Etxeberria et al. Evaluating the noisy channel model for the normalization of historical texts: Basque, Spanish and Slovene
Abate et al. Development of Amharic morphological analyzer using memory-based learning
Gu et al. Markov modeling of mandarin Chinese for decoding the phonetic sequence into Chinese characters
JP5853595B2 (en) Morphological analyzer, method, program, speech synthesizer, method, program
JP6718787B2 (en) Japanese speech recognition model learning device and program
KR101663038B1 (en) Entity boundary detection apparatus in text by usage-learning on the entity&#39;s surface string candidates and mtehod thereof
Etxeberria et al. Weighted finite-state transducers for normalization of historical texts
Taji et al. The columbia university-new york university abu dhabi sigmorphon 2016 morphological reinflection shared task submission
JP5523929B2 (en) Text summarization apparatus, text summarization method, and text summarization program
JP4478042B2 (en) Word set generation method with frequency information, program and program storage medium, word set generation device with frequency information, text index word creation device, full-text search device, and text classification device
Hsieh et al. Correcting Chinese spelling errors with word lattice decoding
Asahiah Development of a Standard Yorùbá digital text automatic diacritic restoration system
Hahn et al. Optimizing CRFs for SLU tasks in various languages using modified training criteria
JP2009176148A (en) Unknown word determining system, method and program
KR19980047177A (en) Korean document analyzer for voice conversion system

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20100113

RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20110613

RD02 Notification of acceptance of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7422

Effective date: 20110614

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20110615

RD03 Notification of appointment of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7423

Effective date: 20110616

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20120629

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20120709

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20120829

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20120918

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20120918

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20150928

Year of fee payment: 3

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

LAPS Cancellation because of no payment of annual fees