JP2002032369A

JP2002032369A - Dictionary-preparing device

Info

Publication number: JP2002032369A
Application number: JP2000216756A
Authority: JP
Inventors: Sayori Shimohata; さより下畑
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2000-07-18
Filing date: 2000-07-18
Publication date: 2002-01-31

Abstract

PROBLEM TO BE SOLVED: To realize a dictionary-preparing device capable of preparing a parallel translation type dictionary even from text in a single language. SOLUTION: A conversion table 33 describes a target language part of speech, which corresponds to the part of speech of the word string of the original language and a conversion rule from the original language word string into a target language word string. An extraction processing part 21 extracts a word string from the text in the original language. A part of the speech attaching part 22 attaches a part of the speech to the word string. A conversion processing part 23 converts the word string in the original language into a word string in the target language, on the basis of the attached part of speech according to the conversion rule by referring to the table 33 and outputs the word string in the target language as dictionary data.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、機械翻訳システム
などで用いる辞書を自動的に作成する辞書作成装置に関
するものである。[0001] 1. Field of the Invention [0002] The present invention relates to a dictionary creating apparatus for automatically creating a dictionary used in a machine translation system or the like.

【０００２】[0002]

【従来の技術】インターネットの普及、パーソナルコン
ピュータ利用者の増加に伴い、ある言語で記述されたテ
キストを別の言語に翻訳する機械翻訳システムが急速に
普及している。しかしながら、一般の機械翻訳システム
では、システムが提供する辞書だけでは多様な分野の文
章を適切に翻訳することができないため、ユーザ毎ある
いは分野毎の辞書を構築するのが不可欠であった。この
ような辞書を作成するのは非常に労力が必要となる。そ
こで、翻訳対象そのもの、あるいは翻訳対象と同じ分野
のテキストデータから機械翻訳用辞書を自動的に作成す
る方法が提案されている。2. Description of the Related Art With the spread of the Internet and an increase in the number of personal computer users, a machine translation system for translating a text described in one language into another language has rapidly spread. However, in a general machine translation system, sentences in various fields cannot be appropriately translated only by a dictionary provided by the system, and thus it is essential to construct a dictionary for each user or each field. Creating such a dictionary requires a great deal of effort. Therefore, there has been proposed a method of automatically creating a machine translation dictionary from a translation target itself or text data in the same field as the translation target.

【０００３】このような従来技術として、特開平１０−
２６９２２２号公報「機械翻訳における辞書作成支援装
置」に開示されている方法がある。これは、日本語テキ
ストデータからカタカナ表記の文字列を抽出し、そのカ
タカナ表記の文字列から変換テーブルを用いて英語の綴
り候補の集合を求め、その英語の綴り候補の集合と英語
テキストデータを照合することにより、訳語を自動的に
付与するものである。As such a prior art, Japanese Patent Laid-Open No.
There is a method disclosed in Japanese Patent Publication No. 269222, “Dictionary creation support device in machine translation”. In this method, a character string in Katakana notation is extracted from Japanese text data, a set of English spelling candidates is obtained from the character string in Katakana notation using a conversion table, and the set of English spelling candidates and the English text data are extracted. By collating, the translation is automatically given.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記従
来技術では次のような問題があった。第一に、必ず２言
語（この場合は日本語と英語）のテキストデータが必要
である。第二に、カタカナ表記の日本語単語から英語の
訳語候補を推定するので、カタカナ表記しない日本語の
単語は登録できない。第三に、英語テキストデータ中に
綴り候補の集合とマッチする単語がない場合には訳語を
得ることができない。However, the above prior art has the following problems. First, text data in two languages (Japanese and English in this case) is always required. Second, since English translation candidate words are estimated from Japanese words in katakana notation, Japanese words without katakana notation cannot be registered. Third, if there is no word that matches the set of spelling candidates in the English text data, a translated word cannot be obtained.

【０００５】このように、従来技術では、単言語のテキ
ストから対訳形式の辞書を作成することができないとい
った問題があった。As described above, in the prior art, there is a problem that a bilingual dictionary cannot be created from a monolingual text.

【０００６】[0006]

【課題を解決するための手段】本発明は、前述の課題を
解決するため次の構成を採用する。〈構成１〉原言語のテキストから単語列を抽出する抽出
処理部と、抽出処理部で抽出された単語列の品詞を付与
する品詞付与部と、原言語の単語列の品詞に対応する目
的言語品詞と、原言語の単語列から目的言語の単語列へ
の変換規則を記述した変換テーブルと、抽出処理部で抽
出された単語列に対し、変換テーブルの変換規則を参照
して品詞付与部で付与された単語列の品詞に基づき目的
言語の単語列に変換し、目的言語の単語列を原言語の辞
書データとして出力する変換処理部とを備えたことを特
徴とする辞書作成装置。The present invention employs the following structure to solve the above-mentioned problems. <Structure 1> An extraction processing unit that extracts a word string from a text in the source language, a part-of-speech providing unit that adds the part of speech of the word string extracted by the extraction processing unit, and a target language corresponding to the part of speech of the word string in the source language A part-of-speech unit refers to the conversion table that describes the part of speech and a conversion rule from the source language word string to the target language word string, and the conversion rule of the conversion table for the word string extracted by the extraction processing unit. A conversion device for converting a word string of a target language into a word string of a target language based on a part of speech of an assigned word string, and outputting the word string of the target language as dictionary data of a source language.

【０００７】〈構成２〉構成１に記載の辞書作成装置に
おいて、変換テーブルの変換規則は、原言語の単語を置
き換えた内容と、目的言語の機能語とで構成されている
ことを特徴とする辞書作成装置。<Structure 2> In the dictionary creating apparatus described in Structure 1, the conversion rule of the conversion table is constituted by contents obtained by replacing words in the source language and functional words in the target language. Dictionary creation device.

【０００８】〈構成３〉構成２に記載の辞書作成装置に
おいて、原言語の単語を置き換えた内容は、原言語の単
語をそのまま使用する規則であることを特徴とする辞書
作成装置。<Structure 3> The dictionary creating apparatus according to Structure 2, wherein the contents of the source language words replaced are rules using the source language words as they are.

【０００９】〈構成４〉構成２に記載の辞書作成装置に
おいて、原言語の単語を置き換えた内容は、原言語の単
語を表音文字で表記する規則であることを特徴とする辞
書作成装置。<Structure 4> In the dictionary creating apparatus according to Structure 2, the contents of the source language words replaced are rules for expressing the source language words in phonograms.

【００１０】[0010]

【発明の実施の形態】《本発明の概略》技術文書などを
目的言語である日本語に翻訳する場合、誤翻訳を避けた
り曖昧性を無くしたりするために、専門用語の訳語には
原言語の単語をそのまま使ったり、原言語の単語をカタ
カナに置き換えただけの訳語を使ったりすることが多
い。そこで、本発明は、１言語のテキストデータから単
語または単語列を抽出し、目的言語への変換テーブルを
用いて原言語の単語または単語列を目的言語に変換する
ことにより、その訳語を自動的に生成し、辞書を作成で
きる辞書作成装置を提供するものである。DESCRIPTION OF THE PREFERRED EMBODIMENTS << Summary of the Invention >> When a technical document or the like is translated into Japanese, which is a target language, the terminology is translated into the original language in order to avoid mistranslation and eliminate ambiguity. In many cases, the words in the original language are used as they are, or the translated words are obtained by simply replacing the words in the source language with katakana. Therefore, the present invention extracts a word or word string from text data in one language, and converts the word or word string in the source language into the target language by using a conversion table for the target language, thereby automatically converting the translated word. And a dictionary creation device capable of creating a dictionary.

【００１１】以下、本発明の実施の形態を具体例を用い
て詳細に説明する。《具体例》〈構成〉図１は本発明の辞書作成装置の具体例を示す構
成図である。図の装置は、入出力装置１と、処理装置２
と、記憶装置３とを有する。入出力装置１は、テキスト
データや各種の操作コマンドを入力するキーボード、マ
ウス、ファイル等の入力手段と、処理過程の表示等を行
うためのＣＲＴ、出力ファイル等の出力手段からなるも
のである。記憶装置３は、磁気ディスク装置や半導体メ
モリ等からなるもので、原言語文のテキストを格納する
テキストファイル３１と、各段階の処理結果を保存する
ワークファイル３２と、原言語から目的言語である日本
語へ変換する規則を記述した変換テーブル３３と、作成
した辞書を保存する辞書ファイル３４を有している。Hereinafter, embodiments of the present invention will be described in detail using specific examples. << Specific Example >><Configuration> FIG. 1 is a configuration diagram showing a specific example of a dictionary creation device of the present invention. The illustrated device includes an input / output device 1 and a processing device 2
And a storage device 3. The input / output device 1 includes input means such as a keyboard, a mouse, and a file for inputting text data and various operation commands, and output means such as a CRT for displaying a processing process and an output file. The storage device 3 is composed of a magnetic disk device, a semiconductor memory, or the like, and has a text file 31 for storing texts of source language sentences, a work file 32 for storing processing results of each stage, and a source language to a target language. It has a conversion table 33 in which rules for conversion into Japanese are described, and a dictionary file 34 for storing the created dictionary.

【００１２】ワークファイル３２は、後述する図４、５
に示すように、原言語の品詞を格納する原言語品詞格納
部３２１、原言語の単語列を格納する原言語単語格納部
３２２、日本語の品詞を格納する日本語品詞格納部３２
３と、日本語の単語列を格納する日本語単語格納部３２
４から構成されている。変換テーブル３３は、後述する
図６に示すように、原言語の品詞を格納する原言語品詞
格納部３３１と、日本語の品詞を格納する日本語品詞格
納部３３２と、日本語への変換規則を格納する変換規則
格納部３３３から構成されている。辞書ファイル３４
は、後述する図７に示すように、原言語の品詞を格納す
る原言語品詞格納部３４１と、原言語の単語列を格納す
る原言語単語格納部３４２と、日本語の品詞を格納する
日本語品詞格納部３４３と、日本語の単語列を格納する
日本語単語格納部３４４から構成されている。The work file 32 is shown in FIGS.
, A source language part-of-speech storage unit 321 storing source language parts of speech, a source language word storage unit 322 storing source language word strings, and a Japanese part-of-speech storage unit 32 storing Japanese parts of speech.
3 and a Japanese word storage unit 32 for storing a Japanese word string
4. As shown in FIG. 6, which will be described later, the conversion table 33 includes a source language part of speech storage unit 331 that stores source language parts of speech, a Japanese part of speech storage unit 332 that stores Japanese parts of speech, and a conversion rule for Japanese. Is stored in the conversion rule storage unit 333. Dictionary file 34
7, a source language part-of-speech storage unit 341 storing source language parts of speech, a source language word storage unit 342 storing source language word strings, and a Japanese language part storing Japanese parts of speech, as shown in FIG. It comprises a part-of-speech storage unit 343 and a Japanese word storage unit 344 that stores a Japanese word string.

【００１３】処理装置２は、演算装置やメモリ、制御部
等からなるもので、翻訳パターン作成処理を実行する機
能を有している。処理装置２は、抽出処理部２１、品詞
付与部２２、変換処理部２３を備えている。抽出処理部
２１は、原言語のテキストファイル３１から辞書に登録
する単語および単語列（以下、登録候補と呼ぶ）を抽出
し、ワークファイル３２の原言語単語格納部３２２に格
納する処理を行う機能を有するものである。品詞付与部
２２は、ワークファイル３２の原言語単語格納部３２２
に格納された内容を１行ずつ読み、その品詞を推定して
日本語品詞格納部３２３に格納する機能を有している。
変換処理部２３は、ワークファイル３２に格納された登
録候補の単語列と品詞の組を１行ずつ読み、変換テーブ
ル３３の該当する品詞の変換規則に基づいて訳語に変換
し、辞書ファイル３４に格納する機能を有している。The processing device 2 includes an arithmetic unit, a memory, a control unit, and the like, and has a function of executing a translation pattern creation process. The processing device 2 includes an extraction processing unit 21, a part of speech giving unit 22, and a conversion processing unit 23. The extraction processing unit 21 performs a process of extracting words and word strings (hereinafter, referred to as registration candidates) to be registered in the dictionary from the source language text file 31 and storing the words and word strings in the source language word storage unit 322 of the work file 32. It has. The part-of-speech providing unit 22 stores the source language word storage unit 322
Has a function of reading the contents stored in the Japanese part-of-speech line by line, estimating the part of speech, and storing it in the Japanese part of speech storage unit 323.
The conversion processing unit 23 reads a set of a word string and a part of speech of a registration candidate stored in the work file 32 one line at a time, converts the set into a translated word based on the conversion rule of the corresponding part of speech in the conversion table 33, It has the function of storing.

【００１４】〈動作〉図２は、具体例の処理の流れを示
すフローチャートである。図３は、図２のフローチャー
トにおけるステップＳ５の詳細を示すフローチャートで
ある。以下、本具体例では、英語のテキストから英日対
訳辞書を作成する場合の各処理の過程を具体的な例を用
いて説明する。<Operation> FIG. 2 is a flowchart showing the flow of processing in a specific example. FIG. 3 is a flowchart showing details of step S5 in the flowchart of FIG. Hereinafter, in this specific example, the process of each process when an English-Japanese bilingual dictionary is created from English text will be described using a specific example.

【００１５】［ステップＳ１］入出力装置１より英語の
テキストを入力し、抽出処理部２１がテキストファイル
３１に格納する。テキストはキーボードから直接入力し
ても良いし、ファイルを指定する方法でもよい。［ステップＳ２］抽出処理部２１は、テキストファイル
３１を読み込み、辞書登録の候補となる単語列を抽出す
る。単語列を抽出する方法としては、形態素解析や構文
解析を行い抽出したい品詞や構文構造を持つ単語列（例
えば名詞句等）を取り出す方法やテキスト中の単語列の
出現頻度を求め、出現頻度の高い単語列を抽出する方法
等がある。[Step S1] An English text is input from the input / output device 1, and the extraction processing unit 21 stores it in the text file 31. The text may be input directly from the keyboard or a method of specifying a file. [Step S2] The extraction processing unit 21 reads the text file 31, and extracts a word string that is a candidate for dictionary registration. As a method of extracting a word string, there is a method of extracting a word string (for example, a noun phrase or the like) having a part of speech or a syntactic structure to be extracted by performing a morphological analysis or a syntactic analysis, or a method of obtaining an appearance frequency of a word string in a text, There is a method of extracting a high word string, and the like.

【００１６】［ステップＳ３］抽出処理部２１は、抽出
した単語列をワークファイル３２の原言語単語格納部３
２２に格納する。図４は、抽出処理が行われた後のワー
クファイル３２の内容を示す説明図である。この例で
は、抽出された単語として、“URL prefix”“map”“a
dditional”が原言語単語格納部３２２に格納されてい
る。[Step S3] The extraction processing section 21 stores the extracted word string in the source language word storage section 3 of the work file 32.
22. FIG. 4 is an explanatory diagram showing the contents of the work file 32 after the extraction processing has been performed. In this example, “URL prefix”, “map”, “a
"dditional" is stored in the source language word storage unit 322.

【００１７】［ステップＳ４］品詞付与部２２は、ワー
クファイル３２の原言語単語格納部３２２の内容を１行
ずつ読み、単語列の品詞を推定する。品詞付与には、一
般の形態素解析や構文解析を用いる。推定された品詞
を、ワークファイル３２の原言語品詞格納部３２１に格
納する。図５は、品詞付与処理が行われた後のワークフ
ァイル３２の内容を示している。図示のように、“URL
prefix”には名詞、“map”には動詞、“additional”
には形容詞が付与されている。[Step S4] The part of speech assigning section 22 reads the contents of the source language word storage section 322 of the work file 32 line by line, and estimates the part of speech of the word string. General morphological analysis and syntax analysis are used for the part-of-speech assignment. The estimated part of speech is stored in the source language part of speech storage unit 321 of the work file 32. FIG. 5 shows the contents of the work file 32 after the part-of-speech giving process has been performed. As shown, “URL
noun for "prefix", verb for "map", "additional"
Is given an adjective.

【００１８】［ステップＳ５］変換処理部２３は、ワー
クファイル３２の原言語単語格納部３２２および原言語
品詞格納部３２１の内容を１行ずつ読み、原言語品詞格
納部３２１から読み込んだ品詞と変換テーブル３３の原
言語品詞格納部３３１の品詞を照合する。照合の結果、
マッチするものがあれば、該当する変換規則に従ってワ
ークファイル３２の原言語単語格納部３２２から読み込
んだ単語列を日本語に変換する。[Step S5] The conversion processing section 23 reads the contents of the source language word storage section 322 and the source language part-of-speech storage section 321 of the work file 32 line by line, and converts the contents with the part of speech read from the source language part-of-speech storage section 321. The part of speech in the source language part of speech storage unit 331 of the table 33 is collated. As a result of matching,
If there is a match, the word string read from the source language word storage unit 322 of the work file 32 is converted into Japanese according to the corresponding conversion rule.

【００１９】図６は、変換テーブル３３の一例を示す説
明図である。ここで、変換規則格納部３３３の“***”
は、英語の文字列がそのまま訳語に代入されることを示
しているが、英語からカタカナへの変換を行った結果を
代入するようにしても良いし、機械翻訳システムを使っ
て英語を翻訳した結果を代入するようにしてもよい。こ
のように、変換テーブル３３の変換規則格納部３３３
は、原言語の単語を置き換えた内容と、日本語の機能
語、即ち、動詞の場合は「する」といったようにその品
詞の文法的機能を表す語から構成されている。FIG. 6 is an explanatory diagram showing an example of the conversion table 33. Here, “***” in the conversion rule storage unit 333
Indicates that the English character string is directly substituted for the translated word, but it is also possible to substitute the result of converting from English to katakana, or to translate the English using a machine translation system The result may be substituted. As described above, the conversion rule storage unit 333 of the conversion table 33 is used.
Is composed of the contents in which words in the source language are replaced, and functional words in Japanese, that is, words indicating the grammatical function of the part of speech, such as "do" in the case of a verb.

【００２０】次に、図３を用いて上記ステップＳ５にお
ける変換処理部２３の処理を詳細に説明する。［ステップＳ５１］ワークファイル３２のデータが終わ
りでないかチェックする。終わりであれば処理を終了す
る。そうでなければステップＳ５２に進む。［ステップＳ５２］ワークファイル３２からデータを１
行読む。［ステップＳ５３］ワークファイル３２の原言語品詞格
納部３２１の品詞と変換テーブル３３の原言語品詞格納
部３３１の品詞を照合する。［ステップＳ５４〜Ｓ５５］マッチする品詞があれば、
変換テーブル３３のマッチする原言語品詞に対応する変
換規則に従って訳語を生成する。［ステップＳ５６］変換テーブル３３のマッチする原言
語品詞に対応する日本語品詞および生成した訳語を、ワ
ークファイル３２の日本語品詞格納部３２３および日本
語単語格納部３２４にそれぞれ格納する。［ステップＳ５７］マッチする品詞がなければエラー処
理を行い、ステップＳ５１に戻る。以上が図２のステッ
プＳ５の詳細の動作である。Next, the processing of the conversion processing unit 23 in step S5 will be described in detail with reference to FIG. [Step S51] It is checked whether the data of the work file 32 is not completed. If it is over, the process is terminated. Otherwise, the process proceeds to step S52. [Step S52] 1 data from the work file 32
Read the line. [Step S53] The part of speech in the source language part of speech storage 321 of the work file 32 is compared with the part of speech in the source language part of speech storage 331 of the conversion table 33. [Steps S54 to S55] If there is a matching part of speech,
A translation word is generated according to the conversion rule corresponding to the matching source language part of speech in the conversion table 33. [Step S56] The Japanese part of speech corresponding to the matching source language part of speech in the conversion table 33 and the generated translated word are stored in the Japanese part of speech storage unit 323 and the Japanese word storage unit 324 of the work file 32, respectively. [Step S57] If there is no matching part of speech, error processing is performed, and the process returns to step S51. The above is the detailed operation of step S5 in FIG.

【００２１】再び、図２に戻り、具体例の処理を説明す
る。［ステップＳ６］辞書ファイル３４に、ワークファイル
３２に格納されている登録候補とその訳語および各々の
品詞を格納する。Returning to FIG. 2, the processing of a specific example will be described. [Step S6] In the dictionary file 34, the registration candidates stored in the work file 32, their translations, and their parts of speech are stored.

【００２２】次に、変換処理の流れを図５のワークファ
イル３２のデータと図６の変換テーブル３３のデータを
用いて具体的に説明する。［ステップＳ５２］先ず、ワークファイル３２からデー
タを１行読む。品詞「名詞」と英単語列“URL prefix”
が読み込まれる。［ステップＳ５３］次に、読み込まれた品詞「名詞」と
変換テーブル３３の原言語品詞格納部３３１の品詞を照
合する。［ステップＳ５４〜Ｓ５５］変換テーブル３３に「名
詞」があるので、対応する変換規則“***”に従って訳
語を生成する。“***”は英単語列をそのまま置き換え
ることになっているので英単語列がそのまま訳語にな
る。［ステップＳ５６］英語品詞「名詞」に対応する日本語
品詞「名詞」、および訳語をワークファイル３２の日本
語品詞格納部３２３および日本語単語格納部３２４にそ
れぞれ格納する。ステップＳ５１に戻り、２行目、３行
目のデータに対して同様の処理を繰り返し行い、全ての
データに対して処理が終わったら、辞書ファイル３４に
ワークファイル３２に格納されている内容を格納する。Next, the flow of the conversion process will be specifically described using the data of the work file 32 of FIG. 5 and the data of the conversion table 33 of FIG. [Step S52] First, one line of data is read from the work file 32. Part of speech "noun" and English word string "URL prefix"
Is read. [Step S53] Next, the read part of speech “noun” is compared with the part of speech in the source language part of speech storage unit 331 of the conversion table 33. [Steps S54 to S55] Since "noun" is present in the conversion table 33, a translated word is generated according to the corresponding conversion rule "***". Since “***” is to replace the English word string as it is, the English word string becomes the translation word as it is. [Step S56] The Japanese part of speech “noun” corresponding to the English part of speech “noun” and the translated word are stored in the Japanese part of speech storage unit 323 and the Japanese word storage unit 324 of the work file 32, respectively. Returning to step S51, the same processing is repeated for the data in the second and third rows, and when the processing is completed for all the data, the contents stored in the work file 32 are stored in the dictionary file 34. I do.

【００２３】図７は、図５と図６のデータから生成され
た辞書データの内容を示す説明図である。図示のよう
に、原言語単語格納部３４２の単語“URL prefix”“ma
p”“additional”に対応して原言語品詞格納部３４１
には、“名詞”“動詞”“形容詞”が格納され、日本語
品詞格納部３４３には、“名詞”“動詞”“形容動詞”
が格納され、日本語単語格納部３４４には、“URL pref
ix”“mapする”“additionalだ”が格納されている。FIG. 7 is an explanatory diagram showing the contents of dictionary data generated from the data of FIGS. 5 and 6. As illustrated, the words “URL prefix” and “ma” in the source language word storage unit 342 are used.
source language part of speech storage unit 341 corresponding to “p” “additional”
Stores “noun”, “verb” and “adjective”, and the Japanese part of speech storage unit 343 stores “noun”, “verb” and “adjective verb”.
Is stored in the Japanese word storage unit 344, and “URL pref
ix "," map "and" additional "are stored.

【００２４】図８は、図７の辞書データを使用しない場
合と使用した場合の機械翻訳システムの翻訳結果の変化
を示す説明図である。ここでは、入力文８１として、
“Type the URL prefix you want to map.”が入力され
た例を示している。標準辞書だけで入力文８１を翻訳し
た結果８２では、“あなたが地図を作りたいＵＲＬ接頭
辞をタイプしなさい。”となる。また、標準辞書と図７
の辞書を用いて翻訳した結果８３では、“あなたがmap
したいＵＲＬ prefixをタイプしなさい。”となる。こ
こで、例えば、入力文８１の“map”の場合、一般的な
英和辞典では「地図を作る」「位置付ける」「見つけ出
す」などの訳語があるが、いずれも入力文８１の訳とし
て適切とはいえない。また、この場合の“prefix”とは
“ＵＲＬ prefix”というひとかたまりの語である。こ
れは、例えば、http://www.abcd.com/English/index.ht
mlといったＵＲＬのファイル名“index.html”以外の部
分、即ち、www.abcd.com/EnglishやEnglishあるいはhtt
p://www.abcd.comといった部分を指しているため、「Ｕ
ＲＬ接頭辞」では“ＵＲＬ prefix”の意味として正確
とはいえない。従って、このような場合、誤翻訳を避け
るためには、原言語の単語をそのまま使用する方が望ま
しい。FIG. 8 is an explanatory diagram showing changes in the translation result of the machine translation system when the dictionary data of FIG. 7 is not used and when it is used. Here, as the input sentence 81,
An example in which “Type the URL prefix you want to map.” Is input is shown. The result 82 of translating the input sentence 81 only with the standard dictionary is "Type the URL prefix you want to map." In addition, the standard dictionary and FIG.
In the result 83 translated using the dictionary of
Type the URL prefix you want. Here, for example, in the case of "map" in the input sentence 81, there are translation words such as "make a map", "position", and "find out" in a general English-Japanese dictionary. Is not appropriate. In this case, “prefix” is a group of words “URL prefix”. This is, for example, http://www.abcd.com/English/index.ht
ml other than the file name “index.html” of the URL, ie, www.abcd.com/English, English, or htt
Because it points to p: //www.abcd.com, "U
The "RL prefix" is not accurate as the meaning of "URL prefix". Therefore, in such a case, in order to avoid erroneous translation, it is desirable to use the source language words as they are.

【００２５】また、上記の例では、英語から日本語への
変換について説明したが、本発明は任意の２言語の翻訳
に適用することが可能である。例えば、日本語を英語に
変換する場合には、次のような変換テーブルを用いて訳
語の生成を行う。図９は、原言語が日本語の場合の変換
テーブルの一例を示す説明図である。訳語は日本語の単
語をそのまま生成しても良いし、読みをローマ字に置き
換える処理を追加しても良い。図１０は、生成された辞
書データの説明図である。この例では、日本語の名詞で
ある「横綱」や「大関」に対して、英語単語として
「“yokozuna”」「“ozeki”」が登録されている。In the above example, the conversion from English to Japanese has been described, but the present invention can be applied to translation in any two languages. For example, when converting Japanese to English, a translated word is generated using the following conversion table. FIG. 9 is an explanatory diagram illustrating an example of the conversion table when the source language is Japanese. As the translated word, a Japanese word may be generated as it is, or a process of replacing the reading with a Roman character may be added. FIG. 10 is an explanatory diagram of the generated dictionary data. In this example, “yokozuna” and “ozeki” are registered as English words for Japanese nouns “Yokozuna” and “Ozeki”.

【００２６】図１１は、図１０の辞書データを使用しな
い場合と使用した場合の機械翻訳システムの翻訳結果の
変化を示す説明図である。ここでは、入力文９１とし
て、「横綱が大関に勝った。」が入力された例を示して
いる。標準辞書だけで入力文９１を翻訳した結果９２で
は、「A grand champion sumo wrestler defeated a su
mo wrestler of the second highest rank.」となる。
また、標準辞書と図１０の辞書を用いて翻訳した結果９
３では、“Yokozuna”defeated “ozeki”.となる。こ
のように、“yokozuna”“ozeki”のように、原音に近
い綴りで、かつ、引用符“”を付与することによって、
英語以外の外国語であることが分かる。また、翻訳処理
で、例えば「横綱」が「横」と「綱」に分割して翻訳さ
れてしまうといった誤翻訳も防止することができる。FIG. 11 is an explanatory diagram showing changes in the translation result of the machine translation system when the dictionary data of FIG. 10 is not used and when it is used. Here, an example is shown in which “Yokozuna has won Ozeki” has been input as the input sentence 91. In the result 92 of translating the input sentence 91 only with the standard dictionary, “A grand champion sumo wrestler defeated a su
mo wrestler of the second highest rank. "
In addition, as a result of translation using the standard dictionary and the dictionary of FIG.
In 3, it becomes “Yokozuna” defeated “ozeki”. Thus, by spelling close to the original sound and adding quotation marks “”, such as “yokozuna” and “ozeki”,
You can see that it is a foreign language other than English. Further, in the translation process, it is possible to prevent erroneous translation, for example, in which “Yokozuna” is translated by dividing into “Yokozuna” and “Tsunagi”.

【００２７】〈効果〉以上のように、本具体例によれ
ば、原言語の単語列の品詞に対応する目的言語品詞と変
換規則を記述した変換テーブルを用いて、原言語の単語
列に対してその変換規則に基づき目的言語の単語列に変
換し、これを原言語の辞書データとして出力するように
したので、単言語のテキストから機械翻訳等に用いる対
訳形式の辞書を自動的に作成することが可能となる。ま
た、目的言語が例えば日本語の場合に必ずしもカタカナ
表記する単語でなくても登録できる。更に、単語の品詞
に基づく変換規則を用いるため、原言語のテキストから
訳語を得るための綴り候補を予め用意する必要がないと
いった効果がある。<Effects> As described above, according to this specific example, the source language word sequence is converted to the source language word sequence using the conversion table that describes the target language part of speech and the conversion rule corresponding to the part of speech of the source language word sequence. Is converted to a target language word string based on the conversion rules and output as the source language dictionary data, so that a bilingual dictionary used for machine translation and the like is automatically created from monolingual text. It becomes possible. Further, when the target language is, for example, Japanese, it can be registered even if it is not necessarily a word written in katakana. Furthermore, since the conversion rule based on the part of speech of the word is used, there is an effect that it is not necessary to prepare a spelling candidate for obtaining a translated word from the text in the source language in advance.

【００２８】尚、上記具体例では、変換規則としてカタ
カナ表記でもよい、としたが、カタカナ表記のみに限定
されるものではなく、種々の原言語の単語を表音文字で
表記する規則であればよい。In the above example, the conversion rule may be written in katakana notation. However, the conversion rule is not limited to katakana notation. Good.

[Brief description of the drawings]

【図１】本発明の辞書作成装置の具体例を示す構成図で
ある。FIG. 1 is a configuration diagram showing a specific example of a dictionary creation device of the present invention.

【図２】本発明の辞書作成装置の具体例における全体の
処理を示すフローチャートである。FIG. 2 is a flowchart showing an overall process in a specific example of the dictionary creation device of the present invention.

【図３】本発明の辞書作成装置の具体例における変換処
理の詳細を示すフローチャートである。FIG. 3 is a flowchart illustrating details of conversion processing in a specific example of the dictionary creation device of the present invention.

【図４】本発明の辞書作成装置の具体例における抽出処
理が行われた後のワークファイルの内容を示す説明図で
ある。FIG. 4 is an explanatory diagram showing the contents of a work file after an extraction process has been performed in a specific example of the dictionary creation device of the present invention.

【図５】本発明の辞書作成装置の具体例における品詞付
与処理が行われた後のワークファイルの内容を示す説明
図である。FIG. 5 is an explanatory diagram showing the contents of a work file after a part-of-speech giving process is performed in a specific example of the dictionary creation device of the present invention.

【図６】本発明の辞書作成装置の具体例における変換テ
ーブルの一例を示す説明図である。FIG. 6 is an explanatory diagram showing an example of a conversion table in a specific example of the dictionary creation device of the present invention.

【図７】本発明の辞書作成装置の具体例の生成された辞
書データの内容を示す説明図である。FIG. 7 is an explanatory diagram showing the contents of generated dictionary data in a specific example of the dictionary creation device of the present invention.

【図８】具体例の辞書データを使用しない場合と使用し
た場合の機械翻訳システムの翻訳結果の変化を示す説明
図である。FIG. 8 is an explanatory diagram showing a change in a translation result of the machine translation system in a case where dictionary data of a specific example is not used and a case where dictionary data is used.

【図９】原言語が日本語の場合の変換テーブルの一例を
示す説明図である。FIG. 9 is an explanatory diagram showing an example of a conversion table when the source language is Japanese.

【図１０】原言語が日本語の場合の生成された辞書デー
タの説明図である。FIG. 10 is an explanatory diagram of generated dictionary data when the source language is Japanese.

【図１１】原言語が日本語の場合における具体例の辞書
データを使用しない場合と使用した場合の機械翻訳シス
テムの翻訳結果の変化を示す説明図である。FIG. 11 is an explanatory diagram showing a change in a translation result of the machine translation system in a case where dictionary data of a specific example is not used and a case where dictionary data is used when a source language is Japanese.

[Explanation of symbols]

２１抽出処理部２２品詞付与部２３変換処理部３１テキストファイル３３変換テーブル３４辞書ファイル 21 extraction processing unit 22 part of speech assigning unit 23 conversion processing unit 31 text file 33 conversion table 34 dictionary file

Claims

[Claims]

1. An extraction processing unit for extracting a word string from a text in a source language, a part of speech providing unit for adding a part of speech of the word string extracted by the extraction processing unit, and a part of speech of a word string in the source language. A conversion table describing target language part of speech, a conversion rule from the source language word string to a target language word string, and a conversion rule of the conversion table for the word string extracted by the extraction processing unit. And a conversion processing unit that converts the word string of the target language into a word string of the target language based on the part of speech of the word string given by the part of speech giving unit, and outputs the word string of the target language as dictionary data of the source language. Dictionary creation device.

2. The dictionary creation apparatus according to claim 1, wherein the conversion rule of the conversion table is composed of contents obtained by replacing words in a source language and function words in a target language. Creating device.

3. The dictionary creation device according to claim 2, wherein the contents of the replacement of the source language words are rules using the source language words as they are.

4. The dictionary creation device according to claim 2, wherein the contents of the replacement of the source language words are rules for expressing the source language words in phonograms.