JPH0855123A

JPH0855123A - Machine translation system with idiom registering function

Info

Publication number: JPH0855123A
Application number: JP6186127A
Authority: JP
Inventors: Toshiyuki Okunishi; 稔幸奥西; Youji Fukumochi; 陽士福持; Ichiko Sada; いち子佐田; Takeshi Kutsumi; 毅九津見
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1994-08-08
Filing date: 1994-08-08
Publication date: 1996-02-27

Abstract

PURPOSE:To provide a machine translation system capable of corresponding also to a modification form of a fixed part in an index word of an idiom and leading a user definition mark into a variable part in respect to a machine translation system with an idiom registering function. CONSTITUTION:The machine translation system is provided with an idiom registering means 2 for registering an entry word and a translated word of an idiom so that the fixed part of the idiom is expressed by a normal word, a normal word string or a modification developing mark representing the modified expression of the word or the word string and the variable part of the idiom is expressed by the composite format of a 1st representative mark representing the set of words or word strings sharing a prescribed attribute, a modification form developing means 10 for generating and developing the fixed part of the index word of an idiom to be translated to all previously set modified expressions and an idiom translation means 4 for identifying an input character string or a part of the character string with the index word or the like obtained by developing the fixed part of the index word of the idiom is developed to a modified expression.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、電子化辞書あるいは
電子化辞書を搭載した情報検索装置あるいは電子化辞書
を搭載した機械翻訳装置に関し、特に、可変部分を含む
イディオムを登録し検索・翻訳することのできるイディ
オム登録機能を有する辞書検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an electronic dictionary, an information retrieval device equipped with the electronic dictionary, or a machine translation device equipped with the electronic dictionary, and in particular, registers and retrieves / translates an idiom including a variable part. The present invention relates to a dictionary search device having a possible idiom registration function.

【０００２】[0002]

【従来の技術】現在実用化されている言語処理装置に
は、人間の文書作成活動を支援するためのワードプロセ
ッサや、或る言語で書かれた文書を別の言語に翻訳する
ための機械翻訳装置などがある。これらの言語処理装置
には、それぞれの目的に応じた情報を納めた辞書が備え
られている。ここでいう辞書とは、見出語とそれに付帯
する各種の情報の組とを１単位の項目としたものを多数
統合し、見出語を用いて所望の項目を容易に検索できる
ように系統立てて並べたものである。2. Description of the Related Art Language processors currently in practical use include word processors for supporting human document creation activities, and machine translation devices for translating documents written in one language into another language. and so on. These language processing devices are provided with dictionaries that store information according to their respective purposes. The term "dictionary" used here refers to a system that integrates a large number of headwords and a set of various types of information attached to the headwords as one unit so that desired items can be easily searched using the headwords. They are arranged upright.

【０００３】辞書は、原則として機械可読な不揮発性の
媒体に機械可読な形式で記録される。このような辞書
を、電子化辞書と呼ぶことにする。電子化辞書を機械翻
訳において用いる場合には、見出語としては原語（ソー
ス原語）の単語列（１単語のみのものも含む）が用いら
れ、その単語列に付帯する各種の情報として、見出語の
品詞、形態属性、訳語、訳語の品詞等の情報が用いられ
る。The dictionary is recorded in a machine-readable format on a machine-readable non-volatile medium in principle. Such a dictionary will be called an electronic dictionary. When the electronic dictionary is used in machine translation, a word string of the original word (source original word) (including one word only) is used as the found word, and various information attached to the word string is used as the found word. Information such as the part of speech of the word, the form attribute, the translated word, and the part of speech of the translated word is used.

【０００４】このような言語処理装置を用いて利用者が
処理あるいは作成しようとしている文書に、この装置に
備えられた辞書に見出語として記載されていない単語が
含まれている場合には、作業効率が著しく低下してしま
う。そのために、辞書に収録する見出語は、より多いほ
うが好ましい。また、機械翻訳の場合には、原語の各単
語のみではなく、イディオムを見出語として採用し、対
応するターゲット言語の言い回し等をペアとして、この
ようなペアをできるだけ多数登録しておくことが翻訳効
率の上では望ましい。If the document to be processed or created by the user using such a language processing device includes a word that is not described as a headword in the dictionary provided in this device, The work efficiency will be significantly reduced. Therefore, it is preferable that the number of headwords included in the dictionary is larger. Also, in the case of machine translation, it is recommended to use not only each word in the original language but also an idiom as a found word, and the corresponding target language wording, etc. as a pair and register as many such pairs as possible. It is desirable in terms of translation efficiency.

【０００５】通常イディオムには、数詞、所有格代名
詞、再帰代名詞など、主語や他の語との関係においてそ
の形を変えるイディオムが多い。例えば[ do one's bes
t]中のone's は主語に応じてyour, my, his, herなどの
所有格代名詞となる。翻訳処理並びに辞書開発の効率
上、このようなイディオムは、具体的な語を入れたイデ
ィオムを全て列挙するのではなく、見出し語の一部分の
単語としてある文法特徴を共有する単語や句ならば任意
の単語が入るような形で登録できるのが好ましい。以下
では、ある文法特徴を共有する単語や句ならば任意の単
語が入る部分を可変部分と呼び、それ以外のイディオム
の骨格となる単語や単語列の部分を固定部分と呼ぶ。Usually, many idioms, such as numbers, possessive pronouns, and reflexive pronouns, change their forms in relation to the subject and other words. For example [do one's bes
One's in t] is a possessive pronoun such as your, my, his, her depending on the subject. For efficiency of translation processing and dictionary development, such an idiom does not enumerate all idioms in which specific words are entered, but any idiom that shares a certain grammatical feature as a part of a headword is optional. It is preferable that the word can be registered in such a way that it can be entered. In the following, if a word or phrase that shares a certain grammatical feature is included, a part that contains an arbitrary word is called a variable part, and a part of a word or word string that is the skeleton of an idiom is called a fixed part.

【０００６】また、１つのイディオムには、複数の可変
部分が存在するものがあり、可変部分には単語だけでな
く名詞句や文が適用される場合もある。このような種々
の可変部分を表現するために、＊に続く記号（以下この
記号を代表記号と呼ぶ）を導入する機械翻訳装置が提案
されている。Some idioms have a plurality of variable parts, and in some cases not only words but also noun phrases and sentences are applied to the variable parts. In order to express such various variable parts, a machine translation device that introduces a symbol following * (this symbol is hereinafter referred to as a representative symbol) has been proposed.

【０００７】たとえば「〜よりＮ倍…」という日本語訳
をもつ英単語イディオム“N timesas…as〜”を登録す
る場合、次のように記述することができる。英単語 [ *m times as *ad as *CN ] 訳語 *CN より *m 倍 *ad 見出し（英単語列）の中で、先頭に“＊”のついた単語
(*m, *ad, *CN)、すなわち代表記号が可変部分であり、
それ以外の単語(times,as)が固定部分である。For example, in the case of registering the English word idiom "N timesas ... as ~" having the Japanese translation "... than N times ...", it can be described as follows. English word [* m times as * ad as * CN] Translated word * CN times * m times more than * CN Heading (word string), word with "*" at the beginning
(* m, * ad, * CN), that is, the representative symbol is the variable part,
The other words (times, as) are the fixed part.

【０００８】可変部分では、代表記号で表す品詞（上例
では、m:数詞、ad: 形容詞、C:文、N:名詞句，CN:Cまた
はN)の任意の単語と適合できるのに対し、固定部分では
その表記を持つ単語とでないと適合できない。なお、可
変部分に指定できる代表記号は予めシステムに定義され
たものである。The variable part can be matched with any word of the part of speech represented by the representative symbol (m: number, ad: adjective, C: sentence, N: noun phrase, CN: C or N in the above example). , The fixed part cannot match unless it has a word with that notation. The representative symbol that can be designated for the variable portion is defined in advance in the system.

【０００９】今、 [ This apple is three times as big as that orange.
] という英文が入力されたとし、以下に機械翻訳処理の概
要を以下に示す。まず、各単語の辞書引きが行われる。
その結果、次のような単語情報が得られる。 three 数詞(m) big 形容詞(ad) that 冠詞(d) orange 名詞(n)Now, [This apple is three times as big as that orange.
] Is input, the following is an outline of machine translation processing. First, the dictionary is looked up for each word.
As a result, the following word information is obtained. three number (m) big adjective (ad) that article (d) orange noun (n)

【００１０】次に、イディオムの検索と解析処理が実行
され、固定部分ならびに可変部分のマッチングが行われ
る。このとき、入力文の一部が上記のように登録された
イディオムに適合することがわかる。ここで、可変部分*mと“three ”がマッチングし、*ad
と“big ”がマッチングし、*CN と“that orange ”が
マッチングする。また、“times as”と“as”がイディ
オムの固定部分（＃記号部）としてマッチングする。Next, an idiom search and analysis process is executed, and matching of the fixed part and the variable part is performed. At this time, it can be seen that part of the input sentence matches the idiom registered as described above. Here, the variable part * m and “three” match, and * ad
And “big” match, * CN and “that orange” match. Also, "times as" and "as" match as a fixed part (# sign part) of the idiom.

【００１１】次に、イディオムの可変部分の訳を生成す
る。 *CN あのオレンジ *m ３ *ad 大きい最後に、固定部分の訳と合成することで入力文に対する
日本語文を生成する。 Next, a translation of the variable part of the idiom is generated. * CN That orange * m 3 * ad Large Finally, a Japanese sentence for the input sentence is generated by combining with the translation of the fixed part.

【００１２】[0012]

【発明が解決しようとする課題】しかしながら、以上に
示した機械翻訳システムでは、（１）イディオム見出しの固定部分に、活用変化が書け
ない。（２）イディオム見出しの可変部分に、システムで定義
された代表記号以外の記号が使えない。という２つの制限がある。このため、イディオムに登録
する見出し語が増大し、その結果記憶容量及び検索時間
が増大するという問題や、利用者、すなわち辞書にイデ
ィオムを登録する者の負担が大きくなるという問題が発
生する。However, in the machine translation system described above, (1) the utilization change cannot be written in the fixed part of the idiom heading. (2) Symbols other than the system-defined representative symbols cannot be used in the variable part of the idiom heading. There are two restrictions. As a result, the number of headwords registered in the idiom increases, resulting in an increase in storage capacity and search time, and a problem that the user, that is, the person who registers the idiom in the dictionary, has a heavy burden.

【００１３】固定部分と可変部分から成るイディオム見
出し語の固定部分と入力文のマッチングは文字列だけで
比較するので、単複変化する名詞、活用変化する動詞、
助動詞、形容詞、また、助動詞や否定副詞(not) が付加
する単語を含む入力文の場合は、通常よく使われる基本
形のイディオムを登録するだけではマッチングが失敗す
ることとなる。そのため、以上に示したような変化形を
全て異なる見出しとして列挙する必要があった。Since the matching of the fixed part of the idiom headword consisting of the fixed part and the variable part and the input sentence is compared only by the character strings, nouns that change in multiple ways, verbs that change in utilization,
In the case of input sentences that include auxiliary verbs, adjectives, and words added by auxiliary verbs and negative adverbs (not), simply registering the idiom in the basic form that is often used will cause matching to fail. Therefore, it is necessary to enumerate all the variations shown above as different headings.

【００１４】『〜することはできない』という日本語訳
を持つ英単語イディオム“There isno 〜ing.”を例に
説明する。このイディオムは、次のような形式で登録さ
れる。英単語 [There is no *Ving] 訳語 *Ving ことができない。ここで、“*Ving ”は「動詞の進行形」を表す代表記号
である。The English idiom "There is no ~ ing." Having a Japanese translation "I can't do" will be described as an example. This idiom is registered in the following format. English word [There is no * Ving] I can't translate * Ving. Here, “* Ving” is a representative symbol representing “progressive form of verb”.

【００１５】しかし、“There was no going to school
yesterday. ”という文はこのイディオムを含むにも関
わらず過去形の文であるため、文中の“There was no”
が見出しの固定部分の“There is no ”と一致できな
い。すなわち、一部に動詞等の語形変化する単語を含ん
だイディオムの場合、変化形を全て展開して登録する必
要があり、この例では[ There was no ]や[ There will
be no ]という見出しのイディオムを別に登録する必要
があった。However, "There was no going to school
The sentence "yesterday." is a past tense sentence even though it contains this idiom, so "There was no" in the sentence.
Cannot match the fixed part of the heading, "There is no". In other words, in the case of an idiom that includes words that change in inflection such as verbs, it is necessary to expand and register all inflections. In this example, [There was no] and [There will
I had to register another idiom with the heading be no.

【００１６】固定部分と可変部分から成るイディオム見
出しの可変部分は、あらかじめ機械翻訳システムで定義
されたもののみが使用される。これは、幅広い利用者が
ターゲットである商用化システムにおいて、利用者のあ
らゆる要求を予測し代表記号を全て準備しておくのは現
実には不可能であるし、非常に特殊なものまでシステム
で定義しておくのも効率上問題があったからである。As the variable part of the idiom heading consisting of the fixed part and the variable part, only the part defined in advance by the machine translation system is used. It is impossible to predict all the demands of users and prepare all representative symbols in a commercialized system that is targeted by a wide range of users. It is because there was a problem in efficiency to define it.

【００１７】このような従来の機械翻訳システムにおい
て、所望する代表記号が定義されていなかった場合につ
いて、『できる限り〜』という日本語訳を持つ英単語イ
ディオム“as〜as…can ”を例に説明する。２つのasの
間の「〜」に入れることができる語句は、形容詞原級
か、形容詞原級で修飾される名詞句であるが、もし、シ
ステムに定義された代表記号に「形容詞を含む名詞句」
という記号がなかったとすると、利用者はシステムが定
義している代表記号で最も近いもの、例えば、次のよう
に、*N（名詞句）とだけ指定しているイディオムを使用
するしかない。英単語 [ as *N as *n *3 can ] 訳語できる限り *N ここで、*3は主格代名詞を表す代表記号である。In such a conventional machine translation system, when the desired representative symbol is not defined, the English word idiom "as-as ... can" having a Japanese translation "as much as possible" is taken as an example. explain. The word that can be put in "~" between two as is an adjective original class or a noun phrase modified by an adjective original class, but if the representative symbol defined in the system is "noun including adjective phrase"
If there is no such symbol, the user has no choice but to use the closest representative symbol defined by the system, for example, the idiom specified as * N (noun phrase) as follows. English words [as * N as * n * 3 can] Translated as much as possible * N where * 3 is the representative symbol for the nominative pronoun.

【００１８】しかし、このような近似的な登録では実際
にはイディオムを含んでいない文でも誤って認識して、
翻訳精度の低下を招く場合がある。例えば、 [ I regard the person as policeman as you can see.
] あなたがわかるので、私はその人を警官とみなす。という入力文は、２つのasの間の名詞句が形容詞を含ん
でいないため本来ならば上記イディオムの文ではない
が、上記登録ではというように、policeman が*Nとマッチしてしまい、こ
の文がイディオムを含んでいると誤って認識してしま
う。However, in such an approximate registration, even a sentence that does not actually contain an idiom is erroneously recognized,
Translation accuracy may be reduced. For example, [I regard the person as policeman as you can see.
I know you, so I see him as a policeman. The input sentence is not originally an idiom sentence because the noun phrase between the two as does not include an adjective, but in the above registration So policeman matches * N, and mistakenly recognizes that this statement contains an idiom.

【００１９】さらに、イディオムと解釈したas policem
an as you can が主語でsee が動詞からなる関係代名詞
の先行詞がthe personであるという誤った解析がたまた
ま成立し、結果的に次のような誤訳を生成してしまう。「私はできる限り警官が見る人を考慮する。」Furthermore, as policem, which is interpreted as an idiom
The erroneous analysis that the antecedent of the relative pronoun in which an as you can is the subject and the see is a verb is the person happens to occur, and as a result, the following mistranslation is generated. "I will consider who the policeman sees as much as possible."

【００２０】このようにシステムで定義された代表記号
しか使えないため、近似的な指定しかできず、利用者が
意図しなかった誤訳が生じる場合もある。また、指定を
詳細にすると見出しが複雑になる場合が多い。このこと
は、類似した詳細な条件が複数現れた場合に特に顕著に
なる。このように、従来の機械翻訳装置では、以上述べ
たような２つの制限があるため、イディオムに登録する
見出し語が増大しその結果記憶容量及び検索時間が増大
するという問題や、利用者の負担が大きくなるという問
題が発生していた。Since only the representative symbols defined by the system can be used in this way, only approximate designation can be made, and there may be a case where an incorrect translation is not intended by the user. Further, if the specification is detailed, the headline is often complicated. This becomes particularly remarkable when a plurality of similar detailed conditions appear. As described above, since the conventional machine translation device has the above-described two restrictions, the number of headwords registered in the idiom increases, resulting in an increase in storage capacity and search time, and a burden on the user. There was a problem that it became large.

【００２１】そこで、この発明は、以上のような事情を
考慮してなされたものであり、イディオムの見出し語の
固定部分の変化形にも対応でき、可変部分に利用者定義
の記号を導入することができるイディオム登録機能を持
つ機械翻訳装置を提供することを目的とする。Therefore, the present invention has been made in consideration of the above circumstances, and it is possible to cope with a variation of the fixed part of the idiom headword, and to introduce a user-defined symbol into the variable part. An object of the present invention is to provide a machine translation device having an idiom registration function that can perform.

【００２２】[0022]

【課題を解決するための手段】図１に、この発明の基本
構成ブロック図を示す。同図において、この発明は文字
列および記号を入力する入力手段１と、予め定められた
単語又は単語列からなる固定部分と、共通の属性を持つ
単語又は単語列に変化可能な可変部分とからなるイディ
オムに対して、前記固定部分が通常の単語、単語列、又
はその単語もしくは単語列の変形表現を代表する変化展
開記号によって表現され、かつ前記可変部分が所定の属
性を共有する単語又は単語列の集合を代表する第１の代
表記号を複合した形式で表現される見出し語とそのイデ
ィオムの訳語を登録するイディオム登録手段２と、イデ
ィオムの登録と翻訳処理に必要な辞書及び処理結果を記
憶する記憶手段９と、入力単語列を形態素に分解し、か
つ文法解析を行う辞書引き・形態素解析手段３と、翻訳
すべきイディオムの見出し語に対してその固定部分を予
め設定されたすべての変形表現に生成展開する変化形展
開手段１０と、入力文字列あるいはその一部分と、登録
されたイディオムの見出し語あるいは前記変化形展開手
段１０によってその見出し語の固定部分が変形表現に展
開された見出し語との同定を行い、同定されたイディオ
ムの見出し語に対応する文字列の訳語を生成するイディ
オム翻訳手段４と、構文解析手段５と、構文変換手段６
と、翻訳文生成手段７と、翻訳文を出力する出力手段８
とを備えたことを特徴とするイディオム登録機能を有す
る機械翻訳装置を提供するものである。FIG. 1 shows a basic block diagram of the present invention. In the figure, the present invention comprises an input means 1 for inputting a character string and a symbol, a fixed part consisting of a predetermined word or word string, and a variable part capable of changing to a word or word string having a common attribute. For the idiom, the fixed part is represented by a normal word, a word string, or a change expansion symbol that represents a modified expression of the word or word string, and the variable part shares a predetermined attribute. An idiom registration means 2 for registering an entry word expressed in a format in which a first representative symbol representing a set of columns and a translation of the idiom are registered, and a dictionary and a processing result necessary for idiom registration and translation processing are stored. Storage means 9, a dictionary lookup / morpheme analysis means 3 for decomposing the input word string into morphemes and performing grammatical analysis, and the idiom headword for translation Variant expansion means 10 for generating and expanding a constant part into all preset variant expressions, an input character string or a part thereof, and a idiom entry word registered or fixed by the variation expansion means 10. An idiom translation means 4 for identifying a headword whose part is expanded into a modified expression and generating a translated word of a character string corresponding to the identified headword of the idiom, a syntax analysis means 5, and a syntax conversion means 6.
And a translated sentence generating means 7 and an output means 8 for outputting the translated sentence.
The present invention provides a machine translation device having an idiom registration function, which comprises:

【００２３】また、前記変化形展開手段１０が、イディ
オムの見出し語の固定部分を、固定部分を構成する単語
を活用変化させた表現形式又はその固定部分に助動詞も
しくは否定副詞を連接させた表現形式に生成展開するよ
うにしてもよい。Further, the variation expansion means 10 uses an expression form in which a fixed part of an idiom headword is changed by utilizing words constituting the fixed part or an expression form in which an auxiliary verb or a negative adverb is connected to the fixed part. It may be generated and expanded.

【００２４】また、新たに定義された属性とその属性値
を有する単語又は単語列を代表する第２の代表記号を前
記記憶手段９に登録する代表記号登録手段１１と、翻訳
すべきイディオムの見出し語の可変部分に含まれる前記
第２の代表記号を定義された属性とその属性値とに生成
展開する代表記号展開手段１２とを備え、前記イディオ
ム登録手段２が、前記入力手段１によって入力された第
１の代表記号および／または第２の代表記号を用いて表
現されるイディオムの見出し語とその訳語を登録し、前
記イディオム翻訳手段４が、入力文字列あるいはその一
部分の属性及びその属性値と、前記代表記号展開手段１
２によって生成展開された見出し語の第２の代表記号の
属性及びその属性値との同定を行うようにしてもよい。Further, a representative symbol registration means 11 for registering a second representative symbol representing a word or a word string having a newly defined attribute and the attribute value in the storage means 9, and a heading of an idiom to be translated. A representative symbol expanding means 12 for generating and expanding the second representative symbol included in a variable part of a word into a defined attribute and its attribute value is provided, and the idiom registration means 2 is inputted by the input means 1. The idiom headword and its translation expressed by using the first representative symbol and / or the second representative symbol are registered, and the idiom translating means 4 causes the attribute of the input character string or a part thereof and the attribute value thereof. And the representative symbol expansion means 1
The attribute of the second representative symbol of the entry word generated and expanded by 2 and its attribute value may be identified.

【００２５】また、前記記憶手段９は、入力された文字
列の翻訳を行うための文法および訳語情報を持つ辞書メ
モリ９ａと、訳語生成に至るまでの処理の結果を記憶す
るバッファメモリ９ｂと、前記イディオム登録手段２に
よって登録されたイディオムを記憶するイディオム登録
メモリ９ｃとから構成することが好ましい。Further, the storage means 9 has a dictionary memory 9a having grammar and translation information for translating an input character string, a buffer memory 9b for storing a result of processing up to translation generation, It is preferably composed of an idiom registration memory 9c for storing the idiom registered by the idiom registration means 2.

【００２６】前記辞書引き・形態素解析手段３は、入力
された文字列を単語に分解し各単語の品詞情報を生成す
る品詞抽出部３ａと、各単語の訳語の候補を生成する訳
語抽出部３ｂとから構成することが好ましい。The dictionary lookup / morpheme analysis means 3 decomposes the input character string into words to generate part-of-speech information of each word, and a translation word extraction part 3b to generate translation candidates for each word. It is preferable to be composed of

【００２７】前記イディオム翻訳手段４は、前記イディ
オム登録メモリを検索し分解された単語列と表現形式が
一致可能なイディオムの見出語の候補を選択するイディ
オム検索部４ａと、イディオムの中の代表記号の位置に
相当する単語又は単語列の属性が、代表記号に与えられ
た属性に一致するイディオムの見出語をイディオム候補
の中から一つに特定するイディオム同定部４ｂと、代表
記号に対応する単語又は単語列の構文を解析しイディオ
ム全体の文構成を生成するイディオム解析部４ｃと、イ
ディオムの文構成を基に、入力された単語列のイディオ
ム部の訳語を生成するイディオム訳語生成部４ｄとから
構成することが好ましい。The idiom translation means 4 searches the idiom registration memory, selects an idiom finding word candidate that can match the decomposed word string and expression form, and a representative of the idioms. An idiom identifying unit 4b that identifies the found word of an idiom whose attribute corresponding to the position of the symbol corresponds to the attribute given to the representative symbol from one of the idiom candidates, and the representative symbol An idiom analysis unit 4c that analyzes the syntax of a word or a word string to generate a sentence structure of the entire idiom, and an idiom translation word generation unit 4d that generates a translated word of the idiom part of the input word string based on the sentence structure of the idiom. It is preferable to be composed of

【００２８】ここで、図１において入力手段１として
は、キーボード、又はマウス、ペンあるいはトラックボ
ールなどのポインティングデバイスが用いられるがこれ
に限定されるものではなく、その他の入力装置を用いて
もよい。また、記憶手段９は通常ＲＯＭ、ＲＡＭ、フロ
ッピーディスク又はハードディスク等が用いられるが、
これに限定されるものではなく、その他の記憶装置を用
いてもよい。特に、辞書メモリ９ａはＲＯＭが好まし
く、バッファメモリ９ｂ、イディオム登録メモリ９ｃ及
び代表記憶メモリ９ｄはＲＡＭが好ましい。Here, as the input means 1 in FIG. 1, a keyboard, or a pointing device such as a mouse, a pen or a trackball is used, but the input means 1 is not limited to this, and other input devices may be used. . The storage means 9 is usually a ROM, a RAM, a floppy disk or a hard disk,
The storage device is not limited to this, and other storage devices may be used. Especially, the dictionary memory 9a is preferably a ROM, and the buffer memory 9b, the idiom registration memory 9c and the representative storage memory 9d are preferably RAM.

【００２９】また、イディオム登録手段２、代表記号登
録手段１１、辞書引き・形態素解析手段３、イディオム
翻訳手段４、構文解析手段５、構文変換手段６及び翻訳
文生成手段７としては、通常ＣＰＵが用いられ、ＲＯ
Ｍ、ＲＡＭ、Ｉ／Ｏインターフェイス等の周辺回路を含
んだマイクロコンピュータを用い、ＲＯＭ又はＲＡＭに
はこの文書処理装置の動作を制御するプログラムが内蔵
されていることが好ましい。A CPU is usually used as the idiom registration means 2, the representative symbol registration means 11, the dictionary lookup / morpheme analysis means 3, the idiom translation means 4, the syntax analysis means 5, the syntax conversion means 6 and the translation sentence generation means 7. Used, RO
It is preferable that a microcomputer including peripheral circuits such as M, RAM, and I / O interface is used, and that the ROM or RAM contains a program for controlling the operation of the document processing apparatus.

【００３０】[0030]

【作用】イディオムの見出し語および訳語を登録する場
合、イディオムの固定部分は通常の単語、単語列又はそ
の単語もしくは単語別の変形表現を代表する変化展開記
号によって表現され、イディオムの可変部分は所定の属
性を共有する単語又は単語列の集合を代表する第１の代
表記号を複合した形式で表現されて、イディオム登録手
段２が記憶手段９に登録する。When an idiom headword and a translated word are registered, the fixed part of the idiom is represented by a normal word, a word string, or a change expansion symbol that represents a modified expression of the word or words, and the variable part of the idiom is predetermined. The idiom registration means 2 registers in the storage means 9 by being expressed in a composite form of the first representative symbols that represent a word or a set of word strings that share the attribute of.

【００３１】これにより、可変部分の属性が共通し、さ
らに固定部分が種々の変形表現されることのある複数個
のイディオムの見出し語を１つの見出し語で登録するこ
とができる。すなわち、この発明によれば、イディオム
の見出し語としてその固定部分及び可変部分に対して考
えられるあらゆるパターンを登録する必要はなく、登録
されるイディオムの見出し語の増大を抑えることができ
る。With this, it is possible to register a plurality of idiom headwords in which the variable portion has the same attribute and the fixed portion may be variously modified and expressed by one headword. That is, according to the present invention, it is not necessary to register all possible patterns for the fixed part and the variable part as idiom headwords, and it is possible to suppress an increase in registered idiom headwords.

【００３２】入力された文字列を単語に分解し、分解さ
れた単語別の一部の表現形式と一致可能なイディオムの
見出し語の候補を検索し、さらにその見出し語の固定部
分の中に変化展開記号がある場合には、変化形展開手段
によって生成展開された変形表現とその変化展開記号が
存在する位置に担当する単語又は単語列との同定を行
い、イディオムを特定する。The input character string is decomposed into words, and a candidate of an idiom headword that can match a part of the decomposed word-by-word expression form is searched, and further changed into a fixed part of the headword. When there is an expansion symbol, the modified expression generated and expanded by the variation expansion means and the word or word string in charge of the position where the expansion symbol is present are identified to identify the idiom.

【００３３】以上のように、この発明によれば、入力文
のうちあるイディオムの固定部分に相当する単語又は単
語列が、そのイディオムの見出し語として登録されてい
る単語又は単語列とは完全に一致しないがその変形表現
と一致する場合にも、イディオムの同定をすることがで
きる。As described above, according to the present invention, a word or word string corresponding to a fixed part of an idiom in an input sentence is completely different from a word or word string registered as an entry word of the idiom. An idiom can also be identified if it does not match, but matches its variant.

【００３４】また、イディオムの可変部分は新たに定義
された属性とその属性値を有する単語又は単語列を代表
する第２の代表記号を含む形式で表現されて記憶手段９
に登録される。The variable part of the idiom is expressed in a format including a newly defined attribute and a second representative symbol representing a word or word string having the attribute value, and the storage means 9
Be registered with.

【００３５】そして、入力された文字列を単語に分解
し、分解された単語列の一部の表現形式と一致可能なイ
ディオムの見出し語の候補を検索し、さらにその見出し
語の可変部分の中に第２の代表記号がある場合には、代
表記号展開手段によって生成展開された属性及びその属
性値と、その第２の代表記号が存在する位置に担当する
単語又は単語列の属性及びその属性値との同定を行い、
イディオムを特定する。Then, the input character string is decomposed into words, and a candidate for an idiom headword that can match a partial expression form of the decomposed word string is searched for, and further, in the variable part of the headword. If there is a second representative symbol in the attribute, the attribute and its attribute value generated and developed by the representative symbol developing means, and the attribute of the word or word string in charge of the position where the second representative symbol exists and its attribute Identify with the value,
Identify the idiom.

【００３６】以上のように、この発明によれば、新たに
定義された属性及びその属性値を有する単語又は単語列
を代表する第２の代表記号を用いてイディオムの見出し
語の可変部分を表現し、かつイディオムの同定を行うの
で、イディオムとして登録する見出し語の長さを抑える
ことができると共に、利用者自身にとって必要な、ある
いは、ある分野の文章に特有な表現形式を持つイディオ
ムの登録及び同定をすることができる。As described above, according to the present invention, the variable part of the idiom headword is expressed by using the second representative symbol representing the word or word string having the newly defined attribute and the attribute value. In addition, since the idiom is identified, the length of the entry word registered as an idiom can be suppressed, and at the same time the user can register the idiom that is necessary for the user himself or has an expression form peculiar to the text of a certain field. Can be identified.

【００３７】[0037]

【実施例】以下、図に示す実施例に基づいて、この発明
を詳述する。なお、これによってこの発明が限定される
ものではない。実施例の説明の前に、機械翻訳の概念に
ついて簡単に説明する。図２を参照して、機械翻訳にお
いて行なわれる解析処理には、様々な解析レベルがあ
る。機械翻訳は、図２の左上に示されるソース言語が入
力された場合に、各レベルの処理を順に行なって最終的
に図２の右側に示されるターゲット言語を得るための処
理である。すなわちソース言語が入力されると、まずレ
ベルＬ１の辞書引き処理、レベルＬ２の形態素解析処
理、レベルＬ３の構文解析処理、…と処理が進められ、
最終的にレベルＬ１０の形態素生成処理が行なわれてタ
ーゲット言語が生成される。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail below based on the embodiments shown in the drawings. The present invention is not limited to this. Before describing the embodiments, the concept of machine translation will be briefly described. Referring to FIG. 2, the analysis processing performed in machine translation has various analysis levels. The machine translation is a process for sequentially performing the processing of each level and finally obtaining the target language shown on the right side of FIG. 2 when the source language shown on the upper left side of FIG. 2 is input. That is, when the source language is input, first, the level L1 dictionary lookup process, the level L2 morphological analysis process, the level L3 syntactic analysis process, ...
Finally, the morpheme generation process of level L10 is performed to generate the target language.

【００３８】機械翻訳は、どのレベルの解析処理まで行
なうかによって、大きく次の２つに分けられる。第１
は、レベルＬ６に示されるソース言語およびターゲット
言語のどちらにも依存しない概念である中間言語まで解
析し、そこからレベルＬ７の文脈生成、レベルＬ８の意
味生成、レベルＬ９の構文生成、レベルＬ１０の形態素
生成へと進み、ターゲット言語を生成していくピボット
方式である。第２は、上述のレベルＬ２の形態素解析、
レベルＬ３の構文解析、レベルＬ４の意味解析およびレ
ベルＬ５の文脈解析のいずれかまで解析を行なってソー
ス言語の内部構造を得、次に、得られたソース言語の内
部構造と同じレベルのターゲット言語の内部構造に変換
した後、ターゲット言語を生成するトランスファー方式
である。Machine translation is roughly divided into the following two types depending on the level of analysis processing to be performed. First
Parses up to the intermediate language, which is a concept independent of both the source language and the target language shown at level L6, and from there, generates context at level L7, generates meaning at level L8, generates syntax at level L9, and generates at level L10. This is a pivot method that proceeds to morpheme generation and generates a target language. Second, the above-mentioned level L2 morphological analysis,
The level L3 syntax analysis, the level L4 semantic analysis, and the level L5 context analysis are performed to obtain the internal structure of the source language, and then the target language at the same level as the obtained internal structure of the source language. This is a transfer method in which the target language is generated after conversion into the internal structure of.

【００３９】以下、図２に示される各解析処理の内容に
ついて説明する。（１）辞書引き、形態素解析ここでは、形態素が格納された辞書を参照しながら入力
された文章を形態素列（単語列）に分割し、この各単語
に対する品詞などの文法情報および訳語を得、さらに時
制・人称・数などを解析する処理が行なわれる。The contents of each analysis process shown in FIG. 2 will be described below. (1) Dictionary lookup, morpheme analysis Here, a sentence input while referring to a dictionary in which morphemes are stored is divided into morpheme strings (word strings), and grammatical information such as a part of speech for each word and a translated word are obtained, Further, processing for analyzing tense, person name, number, etc. is performed.

【００４０】（２）構文解析ここでは、単語間の係り受けなどの文章の構造（構造解
析木）を決定する処理が行なわれる。（３）意味解析複数の構造解析の結果から、意味的に正しいものとそう
でないものとを判別する処理が行なわれる。（４）文脈解析文脈解析処理では、入力された文章の話題を理解し、入
力文章中に含まれる省略部分や曖昧さなどを取去る処理
が行なわれる。(2) Syntactic analysis Here, a process for determining the structure of a sentence (structure analysis tree) such as dependency between words is performed. (3) Semantic analysis Based on the results of a plurality of structural analyses, processing is performed to discriminate between semantically correct and improper ones. (4) Context Analysis In the context analysis process, a process of understanding the topic of the input sentence and removing abbreviated parts and ambiguities contained in the input sentence is performed.

【００４１】次に、図３に示すこの発明の一実施例であ
る機械翻訳装置のブロック図について説明する。同図に
おいて、３１はメインＣＰＵ（中央処理装置）、３２は
メインメモリ、３はＣＲＴ（陰極線管）やＬＣＤ（液晶
表示装置）などからなる表示装置、３４はキーボード、
３５は翻訳モジュール、３６は翻訳モジュール３５に接
続された翻訳用の辞書、文法規則および木変換構造規則
などを格納している辞書メモリ、３７は上記構成部品を
接続するバスである。Next, a block diagram of the machine translation apparatus according to the embodiment of the present invention shown in FIG. 3 will be described. In the figure, 31 is a main CPU (central processing unit), 32 is a main memory, 3 is a display device including a CRT (cathode ray tube) or LCD (liquid crystal display device), 34 is a keyboard,
Reference numeral 35 is a translation module, 36 is a dictionary memory connected to the translation module 35 for storing translation dictionaries, grammar rules, tree transformation structure rules, and the like, and 37 is a bus connecting the above-mentioned components.

【００４２】また、辞書メモリ３６には、イディオム
や、利用者が独自に定義した代表記号を格納しておくこ
とのできる記憶領域を備える。ＣＰＵ３１は、イディオ
ムの登録及び代表記号の登録の処理と、後述する翻訳モ
ジュール３５の処理の制御を行う。The dictionary memory 36 also has a storage area in which idioms and representative symbols uniquely defined by the user can be stored. The CPU 31 controls an idiom registration process, a representative symbol registration process, and a translation module 35 process described below.

【００４３】翻訳モジュール３５は、ソース言語の文章
が入力されると、それを所定の手順で翻訳してターゲッ
ト言語を出力するものである。すなわち、キーボード３
４から入力されたソース言語はメインＣＰＵ３１の制御
により翻訳モジュール３５に送られる。翻訳モジュール
３５は辞書メモリ３６に記憶されている辞書、文法規則
および木構造変換規則等を用いて、入力されたソース言
語を後に詳述するようにしてターゲット言語に翻訳す
る。その結果は、メインメモリ３２に一旦記憶されると
共に、表示装置３３に表示される。When a source language sentence is input, the translation module 35 translates the sentence in a predetermined procedure and outputs the target language. Ie keyboard 3
The source language input from 4 is sent to the translation module 35 under the control of the main CPU 31. The translation module 35 uses the dictionary, grammar rules, tree structure conversion rules, etc. stored in the dictionary memory 36 to translate the input source language into a target language as described in detail later. The result is temporarily stored in the main memory 32 and displayed on the display device 33.

【００４４】図４に翻訳モジュール３５のブロック図を
示す。翻訳モジュール３５は、バス３７に接続され、バ
ス３７を介して入力されるソース言語を、所定の翻訳プ
ログラムに従って翻訳してターゲット言語としてバス３
７に出力するための翻訳ＣＰＵ４５と、バス３７に接続
され、翻訳ＣＰＵ４５で実行される翻訳プログラムを格
納するための翻訳プログラムメモリ４６と、入力された
ソース言語の原文を各単語ごとに格納するためのバッフ
ァＡ（４０）と、バッファＡ（４０）に格納された各単
語につき、辞書メモリ３６に含まれる辞書を参照して得
た各単語の品詞、訳語などの情報を格納するためのバッ
ファＢ（４１）と、ソース言語の構造解析木に関する情
報を格納するためのバッファＣ（４２）と、ソース言語
の構造解析木から変換されたターゲット言語の構造解析
木を格納するためのバッファＤ（４３）と、バッファＤ
（４３）に格納されたターゲット言語の構造解析木に適
切な附属語（日本語ならば助詞や助動詞など）を補充し
て、ターゲット言語の形として整えられた文章を格納す
るためのバッファＥ（４４）とを含む。FIG. 4 shows a block diagram of the translation module 35. The translation module 35 is connected to the bus 37, translates a source language input via the bus 37 according to a predetermined translation program, and uses the bus 3 as a target language.
7, a translation CPU 45 for outputting to 7, a translation program memory 46 connected to the bus 37 for storing a translation program executed by the translation CPU 45, and for storing the input source language original text for each word. Buffer A (40) and a buffer B for storing information such as the part of speech and translation of each word obtained by referring to the dictionary included in the dictionary memory 36 for each word stored in the buffer A (40). (41), a buffer C (42) for storing information about a source language structure analysis tree, and a buffer D (43) for storing a target language structure analysis tree converted from the source language structure analysis tree. ) And buffer D
A buffer E for storing a sentence prepared as a target language form by supplementing the structure analysis tree of the target language stored in (43) with an appropriate annex (for example, a particle or auxiliary verb in Japanese). 44) and.

【００４５】以上のような構成を持つ翻訳モジュール３
５において、少なくとも図２に示したレベルＬ３の構文
解析のレベルまでの解析を行うものとする。ここで、翻
訳処理手順を記述した前記翻訳プログラムは、辞書引き
・形態素解析部、イディオム翻訳部、構文解析部、構文
変換部、翻訳文生成部、変化形展開部及び代表記号展開
部から構成される。Translation module 3 having the above configuration
5, the analysis is performed up to at least the level of the level L3 syntax analysis shown in FIG. Here, the translation program describing the translation processing procedure is composed of a dictionary lookup / morphological analysis unit, an idiom translation unit, a syntactic analysis unit, a syntactic conversion unit, a translated sentence generation unit, a variation expansion unit and a representative symbol expansion unit. It

【００４６】以下、図３〜図１０を参照して、本実施例
の機械翻訳装置による英日翻訳の動作を説明する。ここ
では、イディオムを含まない英文“This is a pen.”を
例にとって、この英文を日本文に翻訳する動作の概要を
示す。The operation of the English-Japanese translation by the machine translation apparatus of this embodiment will be described below with reference to FIGS. Here, an example of an English sentence “This is a pen.” That does not include an idiom is given, and an outline of the operation of translating this English sentence into a Japanese sentence is shown.

【００４７】まず、読込まれた原文は形態素解析によっ
て形態素に分解され、図５に示されるようにバッファＡ
（４０）（図４参照）に格納される。続いて翻訳プログ
ラムメモリ４６に記憶されたプログラムに基づく翻訳Ｃ
ＰＵ４５の制御の下に、辞書引き・形態素解析部によっ
て、バッファＡ（４０）に格納された原文の各単語ごと
に、辞書メモリ３６に格納されている辞書を参照するこ
とにより各単語の訳語などの情報が得られる。たとえ
ば、その情報の一部である品詞情報は、図６のようにバ
ッファＢ（４１）に格納される。First, the read original sentence is decomposed into morphemes by morphological analysis, and as shown in FIG.
(40) (see FIG. 4). Subsequently, the translation C based on the program stored in the translation program memory 46
Under the control of the PU 45, the dictionary look-up / morpheme analysis unit refers to the dictionary stored in the dictionary memory 36 for each word of the original sentence stored in the buffer A (40), and thereby translates each word. Information is obtained. For example, part-of-speech information, which is a part of the information, is stored in the buffer B (41) as shown in FIG.

【００４８】ここで、“this”の多品詞語であって代名
詞、指示形容詞の２つの品詞を持つ。また“is”の品詞
は動詞である。同様に“a ”、“pen ”についてもそれ
ぞれの品詞がバッファＢ（４１）に格納される。“thi
s”は多品詞語であるが、文中の品詞が何であるかにつ
いては、翻訳プログラムのうち構文解析部に相当する処
理によって一意に決定される。Here, it is a multi-part-of-speech word of "this", and has two parts-of-speech, a pronoun and a referential adjective. The part of speech of "is" is a verb. Similarly, the parts of speech for "a" and "pen" are stored in the buffer B (41). "Thi
“S” is a multi-part-of-speech word, but what the part-of-speech in a sentence is is uniquely determined by the process corresponding to the parsing unit in the translation program.

【００４９】翻訳プログラムのうち構文解析部に相当す
る処理においては、辞書メモリ３６に格納された辞書お
よび文法規則に従って、各単語間の係り受け関係を示す
構造解析木がたとえば図７に示されるように決定され
る。この構文解析結果は図４のバッファＣ（４２）に格
納される。In the process corresponding to the syntactic analysis unit of the translation program, a structure analysis tree showing the dependency relation between each word is shown in FIG. 7, for example, according to the dictionary and grammatical rules stored in the dictionary memory 36. Is decided. The result of this syntax analysis is stored in the buffer C (42) in FIG.

【００５０】構造解析木の決定は次のようにして行なわ
れる。辞書メモリ３６に格納された文法規則から、英語
に関する文法規則として次のようなものが得られる。文→主部、述部主部→名詞句述部→動詞、名詞句名詞句→代名詞名詞句→冠詞、名詞The determination of the structure analysis tree is performed as follows. From the grammatical rules stored in the dictionary memory 36, the following grammatical rules regarding English can be obtained. Sentence → Main part, Predicate Main part → Noun phrase Predicate → Verb, Noun phrase Noun phrase → Pronoun Noun phrase → Article, Noun

【００５１】この規則のうちたとえば１つ目の規則は、
「文は主部と述部からできている。」ということを表わ
す。以下、これらの規則に従って構造解析木が決定され
る。なお、このような文法規則は同じように日本語につ
いても用意されており、英語の文法規則と日本語の文法
規則との間で対応づけがなされている。Of these rules, for example, the first rule is
"Sentence is composed of main part and predicate." Hereinafter, the structural analysis tree is determined according to these rules. It should be noted that such grammatical rules are similarly prepared for Japanese, and correspondence is made between English grammatical rules and Japanese grammatical rules.

【００５２】翻訳プログラムのうち、構文変換部に相当
する処理においては、辞書メモリ３６の木構造変換規則
を用いて、入力された英文の構造解析木（図７参照）の
構造が、図８に示される日本文に対する構文解析木の構
造に変換される。得られた結果は図４に示されるバッフ
ァＤ（４３）に格納される。この説明において用いられ
ている例文“This is a pen.”は、この変換によって日
本語文字列「これペンである」に変換されたことに
なる。In the process corresponding to the syntax conversion unit in the translation program, the structure of the structural analysis tree (see FIG. 7) of the input English sentence is shown in FIG. 8 using the tree structure conversion rule of the dictionary memory 36. Converted to the structure of the parse tree for the Japanese sentence shown. The obtained result is stored in the buffer D (43) shown in FIG. The example sentence "This is a pen." Used in this description is converted into the Japanese character string "this is a pen."

【００５３】翻訳プログラムのうち翻訳文生成部に相当
する処理を行なう部分は、得られた日本語文字列「これ
ペンである」に適切な助詞「は」や助動詞をつける
ことにより、図９に示されるような日本語の形にし、図
４のバッファＥ（４４）に格納する。この得られた日本
文「これはペンである。」は、図３に示される翻訳モジ
ュール３５から出力され、メインメモリ３２に格納され
るとともに、表示装置３３に表示される。The portion of the translation program that performs the processing corresponding to the translated sentence generation unit is shown in FIG. 9 by adding an appropriate particle "ha" or auxiliary verb to the obtained Japanese character string "this is a pen". It is put in the Japanese form as shown and stored in the buffer E (44) of FIG. The obtained Japanese sentence “This is a pen.” Is output from the translation module 35 shown in FIG. 3, stored in the main memory 32, and displayed on the display device 33.

【００５４】以上が、イディオムを含まない文の翻訳処
理の概要であるが、イディオムを含む文の翻訳処理にお
いては、上記処理のほか、イディオム翻訳部におけるイ
ディオムの同定、解析及び訳語の生成処理が行われ、さ
らに、イディオム翻訳部の処理に関連して、変化形展開
部及び代表記号展開部の処理が行われる。The above is an outline of the translation process of a sentence that does not include an idiom. In the translation process of a sentence that includes an idiom, in addition to the above process, idiom identification, analysis and translated word generation process in the idiom translation unit are performed. Further, in connection with the processing of the idiom translation unit, the processing of the variation expansion unit and the representative symbol expansion unit is performed.

【００５５】ここで変化形展開部は、後述するように、
イディオムの固定部分に対して活用変化などの変化形を
考慮したマッチング処理を行うものである。また、代表
記号展開部は、利用者が独自に定義した代表記号に対し
てマッチング処理を行うものである。以上の各部の処理
は、翻訳モジュール３５の翻訳ＣＰＵ４５によって翻訳
プログラムの手順に従って実行される。Here, the variation expansion unit is, as will be described later,
Matching processing is performed on fixed parts of idioms, taking into account variations such as usage variations. The representative symbol expansion unit performs matching processing on the representative symbol uniquely defined by the user. The processing of each unit described above is executed by the translation CPU 45 of the translation module 35 according to the procedure of the translation program.

【００５６】次に、図１０〜１４を用いて、見出し語の
うち固定部分を変化形展開記号で表現したイディオムの
登録について説明する。変化形展開記号は、次のような
記号である。＊品詞記号（単語）なお、「品詞記号」は、変化対象の「単語」が多品詞だ
った場合に、どの品詞で変化されるかを指定するための
ものである。Next, the registration of an idiom in which a fixed part of a headword is expressed by a modified expansion symbol will be described with reference to FIGS. The modified expansion symbol is the following symbol. * Part-of-speech symbol (word) The "part-of-speech symbol" is for designating which part-of-speech is changed when the change target "word" is a multi-part-of-speech.

【００５７】たとえば、“as〜as can be”というイデ
ィオムは次のように記述される。英単語［ as *a as *x(can) be. ］訳語この上なく*aでｘは助動詞を表す品詞記号であり、助動詞としてのcan
の過去形couldであっても、このイディオムであること
を表している。品詞記号ｘを指定するのは、canが多品
詞語だからである。For example, the idiom "as to as can be" is described as follows. English word [as * a as * x (can) be.] Translated word * a As above, x is a part-of-speech symbol representing an auxiliary verb, and can as an auxiliary verb.
Even though the past tense could be, it represents this idiom. The part-of-speech symbol x is specified because can is a multi-part-of-speech word.

【００５８】すなわち、canは名詞又は動詞としての用
法もあり、品詞の指定がないと、名詞として変化(cans
など)させるべきか、又は動詞として変化(canned)させ
るべきかを特定するのが困難だからである。That is, can has a usage as a noun or a verb, and if no part of speech is specified, it changes as a noun (cans
This is because it is difficult to specify whether or not it should be done or canned as a verb.

【００５９】また、ここで最後のbeには変化展開指定が
ない。これは、canまたはcouldのどちらかであろうが、
be動詞は原形しかありえないからである。このように、
変化形を持つが見出し以外の形は認識してはいけない時
には、変化展開指定をしない。なお変化させるイディオ
ム登録の方が多い場合には、変化指定のデフォルトを逆
にして、無変化の場合に指定させてもよい。Further, here, the last be has no change expansion designation. This could be either can or could,
This is because the be verb can have only its original form. in this way,
When there is a variation but a shape other than the headline cannot be recognized, the variation expansion designation is not performed. If there are more idiom registrations to be changed, the default change specification may be reversed and the change specification may be made.

【００６０】以下に、［ I was as happy as could be.］という入力文があった場合を例にとり、このイディオム
を使った翻訳内容を説明する。図１０に、イディオム部
分の翻訳処理のフローチャートを示す。The translation contents using this idiom will be described below by taking the case where there is an input sentence [I was as happy as could be.] As an example. FIG. 10 shows a flowchart of the translation process of the idiom part.

【００６１】まず、図１０のステップＳ１１１〜Ｓ１１
２において、入力文の先頭単語から順次辞書引きが行わ
れる。代表記号を含んだイディオムも他の単語と同様に
基本辞書に登録されているので、３単語目（s＝3の時）
asの辞書引き中に、［as *aas *x(can) be］の見出しが
検索される（ステップＳ１１３、Ｓ１１７、Ｓ１１
８）。First, steps S111 to S11 in FIG.
In 2, the dictionary is sequentially searched from the first word of the input sentence. The idiom containing the representative symbol is registered in the basic dictionary like other words, so the third word (when s = 3)
The heading [as * aas * x (can) be] is searched while looking up the as dictionary (steps S113, S117, S11).
8).

【００６２】次に、ステップＳ１１４において、イディ
オムの見出し中の各単語と入力文の単語の間でマッチン
グが行われる。イディオム中の固定部分の単語と入力文
の単語の間のマッチングは文字列比較だけで高速に処理
できるので、最初に固定部分だけのマッチング処理を行
う。Next, in step S114, matching is performed between each word in the idiom heading and the word in the input sentence. Since matching between a fixed part word in an idiom and a word in an input sentence can be processed at high speed only by comparing character strings, first, only the fixed part is matched.

【００６３】ここで、この例では、［ as *a as *x(can) be. ］と“ as happy as could b
e.” の間での固定部分のマッチング処理に入る。In this example, [as * a as * x (can) be.] And “as happy as could b.
Matching processing of the fixed part between e. ”begins.

【００６４】図１１に、この固定部分のマッチング処理
のフローチャートを示す。まず、イディオム中の単語番
号を示す変数ｐを初期化する（ステップＳ１２１）。ス
テップＳ１２２において、イディオム中の単語番号ｐの
単語Ｗ_Pが固定部分であるかどうか判断し、固定部分で
ない、すなわち可変部分である場合は、ステップＳ１３
２、Ｓ１３３へ処理を進め、すべての単語が調べられる
まで、処理を繰り返す。FIG. 11 shows a flowchart of the matching process of this fixed part. First, the variable p indicating the word number in the idiom is initialized (step S121). Step in S122, it is determined whether the word W _P word number p in the idiom are fixed part, not the fixed part, that is, if a variable portion, the step S13
2, the process proceeds to S133, and the process is repeated until all words are checked.

【００６５】また、単語Ｗ_Pが固定部分である場合は、
ステップＳ１２３へ進み、単語Ｗ_Pが変化形展開指定か
どうか判断する。単語Ｗ_Pが変化形展開指定の場合は、
ステップＳ１２５、Ｓ１２６へ進み、活用変化させたマ
ッチング処理に入るが、単語Ｗ_Pが変化形展開指定でな
い場合は、単語Ｗ_PとＷ_Sとの比較を行い（ステップＳ
１２４）、一致する場合は、次の単語に対するマッチン
グ処理を繰り返す（ステップＳ１３０〜１３３、Ｓ１２
２）。If the word W _P is a fixed part,
In step S123, it is determined whether the word W _P is the variation expansion designation. If the word W _P is a variant expansion designation,
The process proceeds to steps S125 and S126, and the matching process in which the utilization is changed is entered, but if the word W _P is not the variation expansion designation, the words W _P and W _S are compared (step S
124), if they match, the matching process for the next word is repeated (steps S130 to 133, S12).
2).

【００６６】上記のイディオムの場合、イディオム見出
し先頭の“as”と入力単語が一致しているので、イディ
オム見出しの次の単語“*a”に処理が移る。（ステップ
Ｓ１２２〜Ｓ１２４、Ｓ１３０〜Ｓ１３３）。“*a”は
可変部分を表す代表記号なので、さらに次の単語“as”
に処理が移る（ステップＳ１２２、Ｓ１３２、Ｓ１３
３）。ここで、“as”は変化形展開指定を含まないので
文字列比較を行い（ステップＳ１２４）、一致している
ことがわかる。In the case of the above idiom, since the input word matches "as" at the beginning of the idiom heading, the processing moves to the next word "* a" in the idiom heading. (Steps S122-S124, S130-S133). Since “* a” is the representative symbol for the variable part, the next word “as”
Processing shifts to (steps S122, S132, S13).
3). Here, since "as" does not include the variation expansion designation, character string comparison is performed (step S124), and it is found that they match.

【００６７】次に、イディオムと入力文のそれぞれ次の
単語、*x（can)と“could”のマッチングに処理が移る
（ステップＳ１３０〜Ｓ１３３）。*x（can ）は変化形
展開指定なのでステップＳ１２５へ進み、文字列の比較
に入る前に、“can ”の助動詞としての活用変化を行
う。Next, the process moves to the matching of the next word of the idiom and the input sentence, that is, * x (can) and "could" (steps S130 to S133). Since * x (can) is a variation expansion designation, the process proceeds to step S125, and the utilization of "can" as an auxiliary verb is changed before the comparison of character strings is started.

【００６８】活用変化は時制や単複などの語尾変化の他
に、助動詞や“not”の付加も考慮するので、形態素解
析が持つ語尾処理テーブルだけでなく、図１３に示す変
化形テーブルを使って変化させる。本例では、“can”
を変化展開した“could”“can”に入力の“can”が含
まれることがわかる（ステップＳ１２６）ので、次の単
語のマッチングを調べにいく（ステップＳ１３０〜Ｓ１
３３）。In addition to the inflection changes such as tenses and singularities, additions of auxiliary verbs and "not" are also taken into consideration in the inflection change, the inflection table shown in FIG. Change. In this example, "can"
It can be seen that the input "can" is included in the "could" and "can" that have been expanded and expanded (step S126), so the next word is searched for matching (steps S130 to S1).
33).

【００６９】イディオムと入力文のそれぞれ、次の入力
単語（“be”）とbeのマッチングに処理が移る。これも
同様に一致していることがわかり、結局、固定部分のマ
ッチングは成功することがわかる（ステップＳ１３３に
おいて成功終了）。The processing moves to the matching of the next input word (“be”) and be in each of the idiom and the input sentence. Similarly, it can be seen that they also match, and in the end, matching of the fixed portion is successful (ends in step S133).

【００７０】次に、図１０の可変部分のマッチング処理
（ステップＳ１１５）に入る。図１２に、この可変部分
のマッチング処理のフローチャートを示す。最初に、可
変部分の辞書引きを行う（ステップＳ１４１）。固定部
分のマッチングの際に、代表記号＊ａの対象単語が固定
部分に挟まれたhappy でありえることがわかっているの
で、happy の辞書引きを行なう。Next, the variable part matching process (step S115) in FIG. 10 is started. FIG. 12 shows a flowchart of this variable part matching processing. First, the variable part is looked up in the dictionary (step S141). When matching the fixed parts, it is known that the target word of the representative symbol * a can be happy sandwiched between the fixed parts, so the dictionary of happy is searched.

【００７１】次に、代表記号の中にユーザ代表記号があ
るかどうか調べ（ステップＳ１４２）、もしあればユー
ザ代表記号を定義本体部に置換する（ステップＳ１４
３）。次に、形態素レベルでチェックできるかどうかを
調べる（ステップＳ１４４）。“*a”はシステム定義の
単語品詞なので、“happy”の品詞が形容詞であること
を確認し（ステップＳ１４５）、可変部分のマッチング
が終了する。Next, it is checked whether or not there is a user representative symbol in the representative symbols (step S142), and if there is, the user representative symbol is replaced with the definition body (step S14).
3). Next, it is checked whether or not it can be checked at the morpheme level (step S144). Since "* a" is a system-defined word part-of-speech, it is confirmed that the part-of-speech of "happy" is an adjective (step S145), and the matching of the variable part is completed.

【００７２】もし可変部分の対象単語が複数の単語から
なる場合、すなわち代表記号が句品詞の場合には、構文
処理が呼ばれ、指定の属性がチェックされる（ステップ
Ｓ１４６）。以上で、入力文がイディオム英単語［as *a as *x(can) be. ］訳語この上なく *a でを含むことがわかったので、最後に、次のようなイディ
オム部分の訳文を生成し、訳バッファに格納しておき、
イディオム処理が完了する（ステップＳ１１６）。イディオム部分［as happy as could be］訳文この上なく幸福でIf the target word of the variable portion is composed of a plurality of words, that is, if the representative symbol is a phrase part of speech, the syntactic process is called and the designated attribute is checked (step S146). With the above, it was found that the input sentence contains the idiom English word [as * a as * x (can) be.] Translated by * a, so finally, the translation of the idiom part as follows is generated, Store it in the translation buffer,
The idiom processing is completed (step S116). Idiom part [as happy as could be]

【００７３】さらに処理を繰り返し（ステップＳ１１
７、Ｓ１１８）、イディオムの次の単語から辞書引きを
再開する。この例はイディオムの範囲が文末までなので
この時点で辞書引きが完了する。以降、イディオム以外
の“I was”の単語列に関して、通常の構文解析、構文
変換が行われ、最後に、翻訳文生成処理でイディオム以
外の日本語訳「私は〜あった」とイディオム部分の日本
語訳「この上なく幸福で」の合成が行なわれ、次のよう
な文全体の訳文が得られる。「私は、この上なく幸福であった。」Further processing is repeated (step S11).
7, S118), the dictionary lookup is restarted from the next word of the idiom. In this example, the range of idiom extends to the end of the sentence, so dictionary lookup is completed at this point. After that, with respect to the word string of "I was" other than idiom, normal parsing and syntax conversion are performed, and finally, in the translation generation process, the Japanese translation "I was there" other than idiom The Japanese translation "I'm as happy as possible" is synthesized, and the following translated sentence is obtained. "I was happiest."

【００７４】次に、利用者が登録した代表記号を用いた
イディオムの例と、そのイディオムを用いた文の翻訳処
理について説明する。利用者が新しい代表記号を登録す
るために、次のような書式を利用するものとする。〈新記号〉“::=”〈定義本体部〉ここでこのような書式で記述された代表記号は、ＣＰＵ
３１によって辞書メモリ３６に格納される。Next, an example of an idiom using a representative symbol registered by the user and a sentence translation process using the idiom will be described. The user shall use the following format to register a new representative symbol. <New symbol> “:: =” <Definition body> The representative symbol described in this format is CPU
It is stored in the dictionary memory 36 by 31.

【００７５】また、定義本体部は次のように記述する。 “｛”〈文法属性〉“/”〈値〉“，”〈文法属性〉
“/”〈値〉“，”……“｝” 指定できる文法属性と値（属性値）として、利用者に
は、翻訳システムが内部的に定めているあらゆるパラメ
ータを開放する。これにより、文法や意味の詳細な制約
を使って代表記号を定義でき、簡単にイディオム登録に
利用できるようになる。属性と値は、例えば、図１４の
ような属性を考えることができる。The definition body is described as follows. "{"<Grammarattribute>"/"<value>","<grammarattribute>
"/"<Value>"," ... "}" As a grammar attribute and value (attribute value) that can be specified, the user is open to all parameters internally defined by the translation system. This makes it possible to define representative symbols using detailed grammatical and semantic constraints, which can be easily used for idiom registration. As the attribute and the value, for example, the attribute shown in FIG. 14 can be considered.

【００７６】このように形態素解析以外の種々のレベル
の属性と値も利用者が指定できるようになると、辞書引
きの段階だけでは、ある単語列が利用者が定義した代表
記号かどうかのチェックができなくなる。例えば、ある
単語列の品詞が「名詞句」であるかどうかは構文解析の
段階まで進まないとわからない。図１４の各属性の値に
はそれぞれどの段階でチェックできるかが明記されてい
る。As described above, when the user can specify the attributes and values at various levels other than the morphological analysis, it is possible to check whether a certain word string is a representative symbol defined by the user only at the dictionary lookup stage. become unable. For example, whether or not the part of speech of a certain word string is a "noun phrase" cannot be known until the stage of parsing. The value of each attribute in FIG. 14 clearly indicates at which stage the check can be performed.

【００７７】以下では、 “as 形容詞 a 冠詞無の単数名詞句 as … can” 「この上なく〜」というイディオムに基づき説明する。このイディオムの
語順は特殊であるため、「冠詞無の単数名詞句」という
文法的制約の指定が必要になる。In the following, description will be made based on the idiom "as adjective a, no singular noun phrase as ... can""this is as good as it is". Since the word order of this idiom is special, it is necessary to specify the grammatical constraint "singular noun phrase without articles".

【００７８】以下に、この文法的制約を表す代表記号の
登録処理と翻訳処理について説明を行なう。図１４の表
を用いると、品詞(cat)が名詞句(n)で、活用形(inf)が
単数(sg)、用法(use)が冠詞なし(detnil)という「冠詞
無の単数名詞句」を表す代表記号は、 *Nsg ::=｛cat/n, inf/sg, use/detnil ｝と登録できる。The registration processing and translation processing of the representative symbol representing the grammatical constraint will be described below. According to the table of FIG. 14, "singular noun phrase without article" is that the part-of-speech (cat) is a noun phrase (n), the inflectional form (inf) is singular (sg), and the usage (use) is no article (detnil). The representative symbol representing can be registered as * Nsg :: = {cat / n, inf / sg, use / detnil}.

【００７９】この代表記号を用いることで、上記イディ
オムは次のように簡単に登録できる。英単語［as *a a *Nsg as *x(can）be］訳語この上なく*a *NsgBy using this representative symbol, the above idiom can be easily registered as follows. English words [as * aa * Nsg as * x (can) be] Translated words * a * Nsg

【００８０】次に、このイディオムを用いた翻訳処理を
図１２を用いて説明する。 [ I bought as large a hat as could be. ] があったとすると、まずイディオムの検索から、固定部
分のマッチングまでの処理が、上記実施例と同様に行な
われる。Next, the translation process using this idiom will be described with reference to FIG. If there is [I bought as large a hat as could be.], First, the processes from idiom search to fixed part matching are performed in the same manner as in the above-described embodiment.

【００８１】次に、可変部分のマッチングに移る。例で
は、このイディオムの中に利用者が定義した代表記号を
含んでいるので（ステップＳ１４２）、利用者代表記号
を定義本体部に置換する処理を行なう（ステップＳ１４
３）。Next, the matching of variable parts will be described. In the example, since this idiom includes the representative symbol defined by the user (step S142), the user representative symbol is replaced with the definition body (step S14).
3).

【００８２】本例の代表記号*Nsgの定義本体部の場合に
は、｛cat/n, inf/sg, use/detnil ｝のように展開される。図１４を参照することで、このう
ち、cat/n, inf/sg は形態素解析レベルでチェックでき
る（ステップＳ１４５）のに対して、use/detnilは構文
解析まで進んで初めて、冠詞無名詞であることがわかる
（ステップＳ１４６）。すなわち本例では、“a hat”
の“a”はイディオム見出し中に含まれるので、可変部
分は“hat”だけになり、冠詞無名詞(*Nsg)であること
がわかる。In the case of the definition body of the representative symbol * Nsg of this example, it is expanded as {cat / n, inf / sg, use / detnil}. Referring to FIG. 14, of these, cat / n and inf / sg can be checked at the morphological analysis level (step S145), whereas use / detnil is an article anonym for the first time until syntactic analysis. It is understood (step S146). That is, in this example, "a hat"
Since the “a” of is included in the idiom heading, the only variable part is “hat”, which is an article noun (* Nsg).

【００８３】上記のようにマッチング処理が成功し、イ
ディオムを認識した後は、上記実施例と同様に処理が進
み、イディオム部分の訳「この上なく大きな帽子」が得
られる。イディオム部分 [ as large a hat as could be ] 訳文この上なく大きな帽子さらに、文全体の訳文「私はこの上なく大きな帽子を買った」が得られる。After the matching process has succeeded and the idiom has been recognized as described above, the process proceeds in the same manner as in the above embodiment, and the translation of the idiom portion "a cap that is as large as this" is obtained. The idiom part [as large a hat as could be] Translated The largest hat In addition, the translated sentence of the whole sentence "I bought the largest hat" is obtained.

【００８４】[0084]

【発明の効果】この発明によれば、イディオムの見出し
語の固定部分を変化展開記号を含む表現で登録し、かつ
変化展開記号で表現された固定部分を変形表現に生成展
開し、この変形表現と変化展開記号が存在する位置に相
当する単語又は単語列との同定を行うため、固定部分が
種々の変形表現されることのある複数個のイディオムの
見出し語を、１つの見出し語で登録することができる。
よって、固定部分の種々の変形表現について考えられる
あらゆるパターンを登録する必要はなく、登録されるイ
ディオムの見出し語の増大を押さえることができ、さら
にイディオム記憶容量及び検索時間を抑制できる。According to the present invention, the fixed part of the idiom headword is registered by the expression including the change expansion symbol, and the fixed part expressed by the change expansion symbol is generated and expanded into the modified expression, and this modified expression is generated. In order to identify a word or a word string corresponding to the position where the change expansion symbol exists, a plurality of idiom headwords whose fixed parts may be variously modified are registered as one headword. be able to.
Therefore, it is not necessary to register all possible patterns for various modified expressions of the fixed part, it is possible to suppress an increase in the entry word of the registered idiom, and further it is possible to suppress the idiom storage capacity and the search time.

【００８５】また、この発明によれば、新たに定義され
た属性及びその属性値を有する単語又は単語列を代表す
る第２の代表記号を用いてイディオムの見出し語の可変
部分を表現し、かつイディオムの同定を行うので、イデ
ィオムとして登録する見出し語の長さを抑えることがで
きると共に、利用者自身にとって必要な、あるいは、あ
る分野の文章に特有な表現形式を持つイディオムの登録
及び同定をすることができる。Further, according to the present invention, the variable part of the idiom headword is expressed by using the second representative symbol representing the word or word string having the newly defined attribute and the attribute value, and Since the idiom is identified, the length of the entry word registered as an idiom can be suppressed, and the idiom that is necessary for the user himself or has an expression form peculiar to the text of a certain field can be registered and identified. be able to.

[Brief description of drawings]

【図１】この発明の機械翻訳装置の基本構成を示すブロ
ック図である。FIG. 1 is a block diagram showing a basic configuration of a machine translation device of the present invention.

【図２】機械翻訳の概念を模式的に示す図である。FIG. 2 is a diagram schematically showing the concept of machine translation.

【図３】この発明の一実施例の構成ブロック図である。FIG. 3 is a configuration block diagram of an embodiment of the present invention.

【図４】図３に示される翻訳モジュール３５のブロック
図である。4 is a block diagram of a translation module 35 shown in FIG.

【図５】バッファＡの格納内容を模式的に示す図であ
る。5 is a diagram schematically showing the stored contents of a buffer A. FIG.

【図６】バッファＢの格納内容を模式的に示す図であ
る。FIG. 6 is a diagram schematically showing stored contents of a buffer B.

【図７】バッファＣの格納内容を模式的に示す図であ
る。FIG. 7 is a diagram schematically showing stored contents of a buffer C.

【図８】バッファＤの格納内容を模式的に示す図であ
る。FIG. 8 is a diagram schematically showing stored contents of a buffer D.

【図９】バッファＥの格納内容を模式的に示す図であ
る。FIG. 9 is a diagram schematically showing stored contents of a buffer E.

【図１０】辞書引き・イディオム処理を示すフローチャ
ートである。FIG. 10 is a flowchart showing dictionary lookup / idiom processing.

【図１１】固定部分のマッチング処理を示すフローチャ
ートである。FIG. 11 is a flowchart showing matching processing of a fixed part.

【図１２】可変部分のマッチング処理を示すフローチャ
ートである。FIG. 12 is a flowchart showing matching processing of a variable portion.

【図１３】変化形テーブルを模式的に示す図である。FIG. 13 is a diagram schematically showing a variation table.

【図１４】代表記号として指定できる属性を模式的に示
す図である。FIG. 14 is a diagram schematically showing attributes that can be designated as a representative symbol.

[Explanation of symbols]

１入力手段２イディオム登録手段３辞書引き・形態素解析手段３ａ品詞抽出部３ｂ訳語抽出部４イディオム翻訳手段４ａイディオム検索部４ｂイディオム同定部４ｃイディオム解析部４ｄイディオム訳語生成部５構文解析手段６構文変換手段７翻訳文生成手段８出力手段９記憶手段９ａ辞書メモリ９ｂバッファメモリ９ｃイディオム登録メモリ９ｄ代表記号登録メモリ１０変化形展開手段１１代表記号登録手段１２代表記号展開手段３１メインＣＰＵ３２メインメモリ３３表示装置３４キーボード３５翻訳モジュール３６辞書メモリ３７バス 1 input means 2 idiom registration means 3 dictionary lookup / morphological analysis means 3a part-of-speech extraction section 3b translated word extraction section 4 idiom translation means 4a idiom search section 4b idiom identification section 4c idiom analysis section 4d idiom translation word generation section 5 syntax analysis means 6 syntax conversion Means 7 Translated sentence generation means 8 Output means 9 Storage means 9a Dictionary memory 9b Buffer memory 9c Idiom registration memory 9d Representative symbol registration memory 10 Variant expansion means 11 Representative symbol registration means 12 Representative symbol expansion means 31 Main CPU 32 Main memory 33 Display Device 34 Keyboard 35 Translation module 36 Dictionary memory 37 Bus

───────────────────────────────────────────────────── フロントページの続き (72)発明者九津見毅大阪府大阪市阿倍野区長池町22番22号シャープ株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Takeshi Kitsumi 22-22 Nagaike-cho, Abeno-ku, Osaka-shi, Osaka Prefecture

Claims

[Claims]

1. An idiom comprising an input means for inputting a character string and a symbol, a fixed part consisting of a predetermined word or word string, and a variable part capable of changing to a word or word string having a common attribute. On the other hand, the fixed part is expressed by a normal word, a word string, or a change expansion symbol that represents a modified expression of the word or word string, and the variable part is a set of words or word strings that share a predetermined attribute. An idiom registering means for registering a headword expressed in a format in which a first representative symbol representative of and a translation of the idiom, and a storage means for storing a dictionary and a processing result necessary for idiom registration and translation processing , A dictionary lookup / morpheme analysis means for decomposing the input word string into morphemes and performing grammatical analysis, and a fixed part for the idiom headword to be translated is set in advance. The modified form expansion means for generating and expanding all the modified expressions, the input character string or a part thereof, and the registered idiom headword or the modified part of the headword is expanded to the modified expression. Idiom translation means for identifying the identified headword and generating a translated word of the character string corresponding to the identified idiom headword, syntactic analysis means, syntactic conversion means, translation sentence generation means,
A machine translation device having an idiom registration function, comprising: an output unit that outputs a translated sentence.

2. The variation expansion means transforms a fixed part of an idiom headword into an expression form in which a word constituting the fixed part is utilized or changed or an expression form in which an auxiliary verb or a negative adverb is connected to the fixed part. A machine translation device having an idiom registration function according to claim 1, wherein the machine translation device generates and develops.

3. A representative symbol registering means for registering a second representative symbol representing a word or a word string having a newly defined attribute and its attribute value in the storage means, and a headword of an idiom to be translated. A representative symbol expanding means for generating and expanding the second representative symbol included in the variable portion into a defined attribute and its attribute value, wherein the idiom registration means receives the first representative input by the input means. The idiom headword and its translated word expressed by using the symbol and / or the second representative symbol are registered, and the idiom translation means stores the attribute of the input character string or a part thereof and its attribute value, and the representative symbol expansion. The idiom registration function according to claim 1 or 2, characterized in that the attribute of the second representative symbol of the entry word generated and developed by the means and the attribute value thereof are identified. Machine translation device that.