JPS62184572A

JPS62184572A - Retrieving system for dictionary of cooperative compound word in word division device

Info

Publication number: JPS62184572A
Application number: JP61027288A
Authority: JP
Inventors: Masayuki Kameda; 雅之亀田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-02-10
Filing date: 1986-02-10
Publication date: 1987-08-12

Abstract

PURPOSE:To enable a cooperative compound word to be retrieved, when a character string segmented from an input characater string coincides with the first entry, and the second entry thereafter are present, by retrieving whether the character string represented at the entry is present in a remaining character string or not. CONSTITUTION:An input character string 1 is inputted to a word division device 2, and a word division is performed referring to a dictionary 3, and a result is outputted as a word division output 4. At the dictionary 3, as for the coopera tive compound word, a cooperative compound word 31 in which a word that cooperates with the second entry thereafter other than the first entry is designat ed, is posted, and when the character string coincides with the first entry, and the second entry is present, and the word designated at the second entry is present in the input character string, the cooperative compound word is added in a division candidate word. In this way, the translation of a text including the cooperative compound word can be performed exactly and rapidly.

Description

【発明の詳細な説明】〔目次〕概要産業上の利用分野従来の技術発ＢＡが解決しようとする問題点問題点を解決するための手段作用実施例発明の効果〔概要〕機械翻訳等における入力文の単語分割であって。[Detailed description of the invention] 〔table of contents〕 overview Industrial applications Conventional technology Problems that BA is trying to solve Means to solve problems action Example Effect of the invention 〔overview〕 Word segmentation of an input sentence in machine translation, etc.

離れた単鎖同志の呼応によって元々の単語の意味とは異
なる意味をもつような複合＠（呼応複合語）を含む入力
文の単語分割において、呼応複合語の見出しが複数エン
トリから構成される辞書を設け。In word segmentation of an input sentence containing a compound @ (consonant compound word) that has a meaning different from the original meaning of the word due to the consonance of distant single-chain comrades, a dictionary in which the heading of the consonant compound word is composed of multiple entries. established.

辞書に、呼応する飴が指定されている場合、その呼応す
る飴が、入力文字列中にあるか否かを検査することによ
シ、呼応複合語に対応する辞書内容を辞書引きできるよ
うにしたこと１、〔産業上の利用分野〕本発明は０機械翻訳の際の単諸分割装置における呼応複
合語の辞書検索方式に関する。When a consonant candy is specified in the dictionary, it is now possible to look up the dictionary contents corresponding to the consonant compound word by checking whether the consonant candy is in the input string. What has been done 1. [Industrial Application Field] The present invention relates to a dictionary search method for vocative compound words in a simple division device during zero machine translation.

自動翻訳装置を使用して機械翻訳を行なう場合。When performing machine translation using an automatic translation device.

被翻訳文を単語分割し、各分割した単語に率胎情＠ｉを
添付し、それをもとに文解析を行なう。このような機械
翻訳において、単諸分割装置の出力結果を基に０文解析
を行なう場合、入力文中に呼応複合語（離れた単語同志
の呼応により元々の単語の意味とは異なる意味をもつよ
うになる＠）があると、それを構成する単語ごとの辞書
１’３谷でになく、呼応複合語としての意味に対応する
辞書内容を検索しなけれはならず、このための能率の良
い検索方式が要望されている。The sentence to be translated is divided into words, a rate @i is attached to each divided word, and the sentence is analyzed based on that. In such machine translation, when zero sentences are analyzed based on the output results of a simple segmentation device, it is necessary to create consonant compound words (consonance between distant words that have a meaning different from the original meaning of the word) in the input sentence. When there is @), it is necessary to search the dictionary contents corresponding to the meaning as a consonant compound word, not in the dictionary 1'3 valley of each word that makes up the word, and there is an efficient search for this purpose. A method is required.

[Conventional technology]

従来の単諸分割装置の基本的方式は、入力文字列を先頭
から順に辞書のエントリと一致する部分を切出し、同時
にその辞書内容を付加するものである。The basic method of conventional single-segmentation devices is to sequentially cut out parts of an input character string that match dictionary entries from the beginning, and at the same time add the contents of the dictionary.

オ６図は、この従来の単諸分割装置及び辞書の概要を、
英飴→日本語翻訳に用いた例として示す図である。オ６
図において、５１は入力文字列を示し、５２に単諸分割
装置、５３は辞書、５４は単語分割出力を示す。Figure 6 shows an overview of this conventional single division device and dictionary.
It is a figure shown as an example used for English candy → Japanese translation. O6
In the figure, 51 indicates an input character string, 52 a single division device, 53 a dictionary, and 54 a word division output.

辞書５３には、各単語とそれに対応する訳がつけられて
おシ、ｒｇｉｖｅＪＯ後には「ｇｉｖｅ　ａｗａｙ　Ｊ
　ｐｒ　ｇｉｖｅ　ｕｐ　Ｊ等の熟飴がエントリされて
いる。この場合「ｇｉｖｅ　ｕｐ　Ｊは、入力文字列５
１に示す「Ｉ　ｗｉｌｌ　ｇｉｖｅ　ｉｔ　ｕｐ　５ｏ
ｏｎ　Ｊの工うに、「ｉｔＪをはさんで置かれ、互に呼
応して「あきらめる」という意味をもつが、この場合に
は、各単語出力５４に示すように、呼応複合語としての
意味は得られなく、呼応複合語を＃ｆ成する個々の単語
としての辞書１’３谷しか得られない。Dictionary 53 has each word and its corresponding translation.
Mature candy such as pr give up J is entered. In this case, "give up J" means input string 5.
"I will give it up 5o" shown in 1.
In the word "on J", it is placed between "itJ" and has the meaning of "giving up" in response to each other, but in this case, as shown in each word output 54, the meaning as a consonant compound word is Instead, only the dictionary 1'3 valleys as individual words forming the consonant compound word #f can be obtained.

果際には、同一文字列で異なる辞書のエン）　ＩＪがあ
る場合や、先頭に同−文字列金倉む異なる長さのエン）
　ＩＪがある場合があるので、複数候補がある場合は、
適切な評価関数や前後の候補との接続検査等で一つに選
択されたシ、可能な分割のすべてが示されたシする。In the end, if there is an IJ in the same character string but different dictionaries, or if there is an IJ at the beginning, the same character string has different lengths)
There may be IJ, so if there are multiple candidates,
All possible divisions are shown when a single image is selected using an appropriate evaluation function or a connection check with previous and subsequent candidates.

いずれにしても、　　ｒ　ｇｉｖｅ　ｉｔ　ｕｐ　Ｊ　
　の場合の「ｇｉｖｅ　ｕｐＪのような呼応複合語の場
合、従来のような単諸分割装置による分割方式では、正
確な翻訳が不可能となる。このため、従来装置において
は、別に後から文解析によって呼応処理を行なっている
。In any case, r give it up J
In the case of a consonant compound word such as ``give upJ,'' it is impossible to accurately translate it using the conventional division method using a single division device. The response processing is performed by

[Problems that the invention is supposed to solve]

前述のように従来装置の場合、呼応複合語の翻訳は９文
解析によっていたが、これでは文解析の負担が大きくな
るという問題点を有し、さらに。As mentioned above, in the case of the conventional device, 9-sentence analysis was used to translate vocative compound words, but this had the problem of increasing the burden of sentence analysis.

この場合、呼応複合語の情報を元々の単語の情報に含ま
せる必要があるので辞書内容が複雑になるという問題点
を有していた。In this case, there is a problem in that the contents of the dictionary become complicated because the information on the consonant compound word needs to be included in the information on the original word.

この発明は、このような問題点を改善するためになされ
たもので、簡単な構成で９文解析によらないでも呼応複
合語の適切な訳語を検索することができる単諸分割装置
における呼応複合語の辞書検索方式を提供することを目
的とする。This invention was made in order to improve such problems, and it is a simple configuration that can search for an appropriate translation of a vocative compound word without using 9-sentence analysis. The purpose is to provide a dictionary search method for words.

[Means for solving problems]

上述の問題点を解決するため０本発明においては、オ１
図に示すように、入力文字列ｌを単諸分割装置２に入力
し、辞書３を参照しながら、単語分割し、その結果を単
語分割出力４として出力する。In order to solve the above-mentioned problems, in the present invention, O1
As shown in the figure, an input character string 1 is input to a character string division device 2, and is divided into words while referring to a dictionary 3, and the result is outputted as a word division output 4.

辞書３には、呼応複合語について、第１エントリ単語の
外に、第２エントリ以降に呼応する単語が指定された呼
応複合語３１が記載されておシ。In the dictionary 3, in addition to the first entry word, a consonant compound word 31 in which consonant words are specified from the second entry onward is described.

第１エントリが一致した場合で、第２エントリ以降があ
夛、そこに指定されている単語が他の入力文字中にあれ
ば、その呼応複合語を分割候補単鎖に刃口える。If the first entry matches, and if the word specified in the second entry and subsequent entries is found among other input characters, the consonant compound word is determined to be a split candidate single chain.

[Effect]

これによシ呼応複合賭を含む文の翻訳を正確かつ迅速に
行なうことができる。This makes it possible to accurately and quickly translate sentences containing complex combination bets.

〔実施例〕次に、第２図、第３図、第４図を用いて９本発明の一実
施例の動作を説明する。[Embodiment] Next, the operation of an embodiment of the present invention will be described with reference to FIGS. 2, 3, and 4.

第２図は、辞書における普通の単語と呼応複合語の検索
エン）　９部を示す図であり、第３図は。Figure 2 is a diagram showing part 9 of the search engine for ordinary words and vocative compounds in a dictionary, and Figure 3 is.

入力文字列のうち、単鎖分割されていない部分文）字列
について先頭からの文字列を含む分割候補単語全検索す
るアルゴリズムを示し、第４図は、呼応複合語を含む文
字列「ｇｉｖｅ　ｉｔ　ｕｐ　５ｏｏｎ　Ｊの分割候補
の検索の経過を示す。Fig. 4 shows an algorithm for searching all division candidate words including the character string from the beginning of the input character string (partial sentences that are not single-stranded). The progress of searching for division candidates for up 5oon J is shown.

辞書３には、第２図に示すように呼応複合語については
、第１エントリ単語の他に第２エントリ以降に呼応する
単語が指足されている。例えば。In the dictionary 3, as shown in FIG. 2, for consonant compound words, in addition to the first entry word, consonant words from the second entry onwards are listed. for example.

［ｇｉｖｅ　Ｊについては、　　「ｇｉｖｅ　Ｊ単独で
用いる場合のエントリ部２１．呼応複合語としての「ｇ
ｉｖｅａｗａｙ　Ｊのエントリ部２２．同じく呼応複合
語としての「ｇｉｖｅ　ｕｐ　Ｊのエントリ部２３．熟
語「ｇｉｖｅｉｎ　Ｊ　　としてのエントリ部２４から
成る。各エントリ部の単語の前に置かれた数字は、その
後につづく単語の文字数を表わす。従って、ｒｇｉｖｅ
Ｊのみのエントリ部２１は、　ｒ　ｇｉｖｅ　Ｊの前に
「４」が記入されているだけで、その後にはｒＯＪが置
かれ、　　「ｇｉｖｅ　Ｊ　　の後にはなにもエントリ
がないことを示している。これに対し、呼応複合語の場
合は、エントリ部２２．２３に示すように、第１エント
リであるｇｉｖｅの前に「４」が置かれ、第２エントリ
であるａｗａｙ又はｕｐの前にそれぞれの文字数を示す
ｒ４Ｊ、ｒ２Ｊが、それぞれ置かれる。[For give J, see “Entry section 21 when using give J alone.
Entry section 22 of iveaway J. It also consists of an entry section 23 for "give up J" as a consonant compound word and an entry section 24 for the idiom "givein J."The number placed before the word in each entry section indicates the number of characters in the word that follows. Therefore, rgive
In the J-only entry section 21, only "4" is written in front of r give J, and rOJ is placed after it, indicating that there is no entry after "give J." On the other hand, in the case of vocative compound words, as shown in entry sections 22.23, "4" is placed before the first entry, give, and "4" is placed before the second entry, away or up, respectively. r4J and r2J indicating the number of characters are placed respectively.

また、半なる熟語はエントリ部２４に示すように、単語
と単語の間のスペースをも文字数に含めた形でエントリ
されておシ従って、　「ｇｉｖｅ　ｉｎ　Ｊの場合最初
の数字は「７」と記入されている。In addition, as shown in the entry section 24, the idiom ``half'' is entered in such a way that the space between words is included in the number of characters. Therefore, in the case of ``give in J'', the first number is ``7''. It is filled in.

文字列、ｒＩｗ目１　ｇｉｖｅ　ｉｔ　ｕｐ　５ｏｏｎ
　Ｊ　ｆ例にして、この発明の検索方式全説明する。「
ＩｗｉｌｌＪまでは従来方式と同様であるので、　ｒ　
ｇｉｖｅ　ｉｔ　ｕｐ　Ｊ以後について説明する。まず
、前記ｇ−ｐまでの部分文字列長（スペースも含める）
ｌＯｉｎｏにおき、検索文字長ｎ＝ｏとする（ステップ
■）。String, rIw item 1 give it up 5oon
The search method of the present invention will be fully explained using an example of Jf. "
Since it is the same as the conventional method up to IwillJ, r
I will explain what happens after give it up J. First, the partial string length (including spaces) from g to p.
1Oino, and set the search character length n=o (step ■).

次に検索文字長Ｈ＝ｔとし、ｎがＨｏ以内であることを
確認してから（ステップ■）、ｎ：１即ち。Next, set the search character length H=t, and after confirming that n is within Ho (step ■), n:1, that is.

「ｇ」が辞書にエントリされているかどうかを検索に行
く。Go to search to see if "g" is entered in the dictionary.

このときは「ｇ」が検索されるが、「ｇ」は第２エント
リをもたない（第１図辞書３参照）ので（ステップ■）
６分割候補としてそのまま登録される（ステップ■）。At this time, "g" is searched, but "g" does not have a second entry (see Dictionary 3 in Figure 1), so (step ■)
It is registered as is as a 6-division candidate (step ■).

そして再びステップ■にもどるが１文字長１で対応する
エントリがないので。Then, I go back to step ■ again, but there is no entry corresponding to the length of 1 character.

ステップ■に戻夛、以下文字長を＋１して順次増す。ｎ
が２，３の場合は、「ｇｉＪ、「ｇｉｒＪであシ対応す
るエントリがない。ｎが４になると、まずエントリ部２
１の「ｇｉｖｅｔＪが検索される。これは、第２エント
リをもたないので（ステップ■）。Return to step (2) and increase the character length by +1. n
If n is 2 or 3, there is no corresponding entry for ``giJ'' or ``girJ.'' When n becomes 4, entry section 2 is
1 "givetJ" is searched. This is because it does not have a second entry (step ■).

候補単語として追加登録される。ｎ　＝　４については
、さらにエントリ部２２の［ｇｉｖｅｚＪが検索すれる
。これは第２エントリとしてｒａｗａｙＪｋもつので、
他の入力文字列中から「ａｗａｙ　Ｊ　’ｅ検索するが
、存在しないので候補単語とはならない（ステップ■）
。It will be added as a candidate word. For n = 4, [givezJ in the entry section 22 is further searched. This has rawayJk as the second entry, so
Search for "away J'e" from other input strings, but it does not exist, so it is not a candidate word (step ■)
.

次に、エントリ部２３のｒ　ｇｉｖｅ　ｓ　Ｊが検索さ
れ。Next, r give s J in the entry section 23 is searched.

これも第２エントリｒ　ｕｐ　Ｊ　　があるので、ステ
ップ■に移シ、入力文字列中にｒｕｐＪ　　があるかど
うか検索される。このとき入力文字列中には。Since there is also a second entry rupJ, the process moves to step (2) and a search is made to see if rupJ exists in the input character string. At this time, in the input string.

ｒｕｐＪが存在しておシ、従って、ステップ■、ステッ
プ■を経て、候補単語として追加される。rupJ exists, so it is added as a candidate word through steps ① and ②.

以降、ｎが４については対応するエントリがなく、ざら
にｎが５以降についてもないので、この部分文字列の先
頭からの文字列を含む候補は「ｇ」。Thereafter, there is no corresponding entry for n equal to 4, and there is no corresponding entry for n equal to or greater than 5, so the candidate that includes the character string from the beginning of this partial character string is "g".

ｒ　ｇｉｖｅ　Ｊ　、　ｒ　ｇｉｖｅ　ｕｐ　Ｊ　　の
３つとなる。これを。There are three: r give J and r give up J. this.

まとめたのが第４図である。Figure 4 summarizes the results.

このようにして選んだ候補のうち、「ｇ」は次の「ｉｖ
ｅ」の切出し単語候補と接続しないことから落され、ｔ
た。最も良く知られた評価法である最長一致法によって
、　ｒ　ｇｉｖｅ　Ｊ　（文字長４）よシ「ｇｉｖｅ　
ｕｐ　Ｊ　（文字長６）が優先的に選択される。Among the candidates selected in this way, “g” is the following “iv”
It was dropped because it did not connect with the cut-out word candidate of "e", and t
Ta. Using the longest match method, which is the best known evaluation method, r give J (character length 4) and si ``give
up J (character length 6) is preferentially selected.

第５図は９本発明による辞書検索方式に用いる単諸分割
装置の一実施例を示しておシ、入力装置２０．０ＰＩＪ
ＩＱ、辞書３０１分割結果格納部４０゜文字列格納部５
０よ構成る。ＯＰＵ　１０ｆ’：には。FIG. 5 shows an embodiment of the single division device used in the dictionary search method according to the present invention.
IQ, dictionary 301 division result storage section 40° character string storage section 5
Configure from 0. OPU 10f': To.

単語分割制御部１１．文字列切出部１２．辞書検索照介
部１３０分割候補格納部１４を有している。Word division control unit 11. Character string extraction section 12. It has a dictionary search reference section 130 and a division candidate storage section 14.

辞書３０は、先に述べたとおシ、呼応複合＠を第１エン
トリ、第２エントリを有するという形で登録しである。As mentioned above, the dictionary 30 is registered with the vocative compound @ having a first entry and a second entry.

入力部２０から文字列が入力されると、これは文字列格
納部５０に収容される。制御部１１の制御を受け１文字
列を文字列切出部１２によって最小単位の早暗に切シ出
す。この単語を先に述べた）ような方法で辞書３０と比
較魚介し、先に述べたような方法によって１分割候補を
選出し、これを分割候補格納部１４に収納する。その後
、この候補に対し、最長一致法等の評価関数全適用して
。When a character string is input from the input unit 20, it is stored in the character string storage unit 50. Under the control of the control section 11, one character string is cut out into the minimum unit of quick and dark by the string cutting section 12. This word is compared with the dictionary 30 using the method described above, one division candidate is selected using the method described above, and this is stored in the division candidate storage section 14. Then, apply all evaluation functions such as the longest match method to this candidate.

その結果を分割結果格納部４０に収容する。The results are stored in the division result storage section 40.

なお前記説明は、英語→日本語に翻訳する例について説
明したが、勿論本発明はこれらの言語にのみ限定される
ものではない。In the above description, an example of translation from English to Japanese has been described, but the present invention is of course not limited to these languages.

〔Effect of the invention〕

本発明では、従来、率語分割装置で行なわれなかった呼
応複合語の検索を可能とし、また辞書における呼応複合
時の情報を・、それを構成する率飴とは独立に記述する
ことができるので、効率的な翻訳を行なうことができる
。In the present invention, it is possible to search for vocative compound words, which has not been done with conventional rate word segmentation devices, and it is also possible to describe information on vocative compound words in the dictionary independently of the rate candy that makes up the vocative compound word. Therefore, efficient translation can be performed.

[Brief explanation of drawings]

第１図は本発明を説明するための概要図。第２図は本発明の辞書の検索エントリ部を示す図。第３図は分割候補単語の検索アルゴリズムを示す図。第４図は分割候補単鎖の検索経過例を示す図。第５図は本発明に用いる単諸分割装置の例を示す図。第６図は従来の単諸分割装置及び辞書の概要を示す図で
ある。FIG. 1 is a schematic diagram for explaining the present invention. FIG. 2 is a diagram showing the search entry section of the dictionary of the present invention. FIG. 3 is a diagram showing a search algorithm for dividing candidate words. FIG. 4 is a diagram showing an example of the search progress for split candidate single chains. FIG. 5 is a diagram showing an example of a single-piece dividing device used in the present invention. FIG. 6 is a diagram showing an outline of a conventional single division device and dictionary.

Claims

[Claims] In a dictionary search method in a single division device for machine translation that divides an input character string into words and searches for corresponding translations, 3) is provided, and if the character string cut out from the input character string matches the first entry and there is a second entry or later, it searches for the character string shown in that entry to see if it exists in the remaining input character strings. A dictionary search method for vocative compound words in a word segmentation device, characterized in that: