JPS6389976A

JPS6389976A - Language analyzer

Info

Publication number: JPS6389976A
Application number: JP61234328A
Authority: JP
Inventors: Toshihiko Yokogawa; 横川　壽彦
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-10-03
Filing date: 1986-10-03
Publication date: 1988-04-20
Anticipated expiration: 2011-03-04
Also published as: JPH0821031B2

Abstract

PURPOSE:To analyze a various kinds of character strings without storing a large amount of data, by retrieving a fundamental dictionary means and analyzing the character string for another part of the character string similarly in a case that the fundamental dictionary means is retrieved, and a part of the character string is retrieved. CONSTITUTION:A processing part 16 divides an English text inputted from an input part 14 into retrieval key character strings, and retrieves them first by a dictionary file 22 through a dictionary retrieving part 20, and when no entry to the file 22 exists, recognizes a unit, and outputs the retrieval key character string to a unit recognizing part 24. The recognizing part 24 divides the retrieval key character string, and retrieves a fundamental dictionary file 26 by every divided character string by indicating with a pointer P, and recognizes the character string recorded on the file 26, or the one consisting of the continuity of the recorded string as the character string expressing the unit. Therefore, for example, it is enough to store only (k), (m), or (s), etc., in the file 26, and it is not necessary to store the combination of them, km, km/s, etc., thereby, it is possible to reduce the capacity of the file.

Description

【発明の詳細な説明】技術分野本発明は言語解析装置、特に自動翻訳装置に有用な言語
解析装置に関する。TECHNICAL FIELD The present invention relates to a language analysis device, particularly to a language analysis device useful for automatic translation devices.

従来技術例えば英語等の外国語の文からそれに対応する日本語の
文を作成する場合、入力された英文の形態素を解析し、
その構文を解析し、その文構造を変換し、その後に日本
語の訳文を生成する。Prior Art For example, when creating a corresponding Japanese sentence from a sentence in a foreign language such as English, the morphemes of the input English sentence are analyzed,
It analyzes the syntax, converts the sentence structure, and then generates a Japanese translation.

すなわち、辞書を検索することによって入力文を構成す
る各単語等の形態素を解析し、これらの形態素について
の品詞等の情報を得る。その後、得られた品詞等の情報
に基づいて、各々の単語またはこれらの単語により形成
されるブロック相互の修飾関係を文法ルールにより解析
することにより、入力文の構造の解析、すなわち構文解
析な行う、さらに、解析された構文に基づいて入力文の
文構造を日本語の構文配列の順序に変換し、変換された
配列に従って日本語の形態素を生成し、日本語文を生成
する。That is, by searching a dictionary, the morphemes of each word constituting the input sentence are analyzed, and information such as the part of speech of these morphemes is obtained. Then, based on the obtained information such as parts of speech, the structure of the input sentence is analyzed by analyzing the modifying relationships between each word or the blocks formed by these words using grammatical rules. , Furthermore, based on the parsed syntax, the sentence structure of the input sentence is converted into the order of the Japanese syntactic arrangement, and Japanese morphemes are generated according to the converted arrangement to generate a Japanese sentence.

このような訳文生成の際に行われる形態素解析において
は、辞書を検索することによって単語等の形態素の品詞
その他の情報を得ている。通常の名詞、動詞等の単語の
場合にはその多くを辞書に格納しておくことができるか
ら、容易に検索され、情報を得ることができる。In the morphological analysis performed when generating a translated sentence, the part of speech and other information of morphemes such as words are obtained by searching a dictionary. In the case of ordinary words such as nouns and verbs, many of them can be stored in a dictionary, so they can be easily searched and information obtained.

しかし、例えば長さ、速度、加速度その他の単位を表す
表現は非常に多くの種類が存在するため、これらをすべ
て辞書に格納しておくことは辞書情報の記憶容量をいた
ずらに大きくすることになり、非能率的である。これら
の単位は、例えばｍ／ｓ　、　ｋｍ／ｓ等のように、単
位を表す表現を複数組み合わせた複合的な表現のものが
多いためである。However, since there are so many types of expressions for units such as length, velocity, acceleration, etc., storing them all in a dictionary will unnecessarily increase the storage capacity of dictionary information. , is inefficient. This is because many of these units are complex expressions that are a combination of multiple unit expressions, such as m/s, km/s, etc.

ところがこれらの単位の一部のみを辞書に記憶しておく
装置の場合には入力文に含まれるこれらの複合的な表現
の単位の情報を得ることができないため、形態素解析を
行うことができず、誤った言語解析を行う恐れがあった
。However, in the case of a device that stores only some of these units in a dictionary, it is not possible to obtain information on the units of these complex expressions contained in the input sentence, making it impossible to perform morphological analysis. , there was a risk of incorrect language analysis.

目　　　的本発明はこのような従来技術の欠点を解消し、複合的な
表現からなる文字列のすべてを辞書に記憶しておくこと
なしに、このような文字列を含む入力文の形態素解析を
行うことのできる言語解析装置を提供することを目的と
する。Purpose The present invention solves the drawbacks of the prior art, and makes it possible to perform morphological analysis of input sentences containing strings of complex expressions without having to store all strings of complex expressions in a dictionary. The purpose is to provide a language analysis device that can perform

構成本発明は上記の目的を達成させるため、所定の言語の文
字列を入力する入力手段と、入力された文字列の検索に
用いられ、基本的なデータを記憶する基本辞書手段と、
入力された文字列について基本辞書手段を検索すること
により文字列を解析する解析手段とを有し、解析手段は
、入力された文字列について基本辞書手段を検索するこ
とにより文字列の一部が検索された場合に、文字列の他
の部分について同様に基本辞書手段を検索することによ
り、文字列を解析することを特徴としたものである。以
下、本発明の一実施例に基づいて具体的に説明する。Structure In order to achieve the above object, the present invention includes an input means for inputting a character string in a predetermined language, a basic dictionary means for storing basic data and used for searching the input character string,
and an analysis means for analyzing a character string by searching the basic dictionary means for the input character string, and the analysis means analyzes a part of the character string by searching the basic dictionary means for the input character string. When a character string is retrieved, the character string is analyzed by similarly searching the basic dictionary means for other parts of the character string. Hereinafter, a detailed explanation will be given based on one embodiment of the present invention.

第１図には、本発明による言語解析装置を英日目動翻訳
装置に適用した一実施例が示されている。なお、本発明
は、英語の入力文の形態素解析以外にも用いることがで
き、英語を日本語に翻訳する英日翻訳装置のみならず、
ある１つの言語を他の言語に翻訳する自動翻訳装置にも
効果的に適用されることは、言うまでもない。FIG. 1 shows an embodiment in which a language analysis device according to the present invention is applied to an English-Japanese visual translation device. Note that the present invention can be used for purposes other than morphological analysis of English input sentences, and can be used not only as an English-Japanese translation device that translates English into Japanese.
Needless to say, the present invention can also be effectively applied to an automatic translation device that translates one language into another language.

木実流側は入力部１４を有し、入力部１４には入力装置
１０または入力文書ファイル１２からデータが入力され
る。入力装置１０は例えば、英数字キー等の文字キーや
機能キー等を有するキーボード、紙に記録された英字テ
キストを読み取る光学的文字読み取り装置等を含む、入
力文書ファイル１２は、磁気ディスク等の記憶媒体に英
字テキストを記録した記憶装置である。The wood flow side has an input section 14 into which data is input from the input device 10 or the input document file 12. The input device 10 includes, for example, a keyboard having character keys such as alphanumeric keys, function keys, etc., an optical character reading device for reading alphabetic text recorded on paper, etc. The input document file 12 is stored in a magnetic disk or the like. A storage device that records English text on a medium.

入力部１４は入力文字列バッファ１４ａを有し・入力装
置１０または入力文書ファイル１２から入力された英語
の入力文を入力文字列バッファ１４ａに記憶する。入力
部１４は入力文字列バッファ１４ａに記憶された入力文
を読み出して処理部１６に出力する。The input unit 14 has an input character string buffer 14a, and stores an English input sentence input from the input device 10 or the input document file 12 in the input character string buffer 14a. The input unit 14 reads the input sentence stored in the input character string buffer 14a and outputs it to the processing unit 16.

処理部１８は、辞書ファイルの検索によって、入力部１
４から送られた入力文の形態素解析を行う機能部である
。処理部ｔｅは辞書情報保存テーブルｌｅａを有し、後
述する辞書ファイル２２または基本単位辞書ファイル２
６を検索して得た情報を辞書情報保存テーブルｌｅａに
記憶する。The processing unit 18 searches the dictionary file to input the input unit 1.
This is a functional unit that performs morphological analysis of the input sentence sent from 4. The processing unit te has a dictionary information storage table lea, and stores a dictionary file 22 or a basic unit dictionary file 2, which will be described later.
The information obtained by searching 6 is stored in the dictionary information storage table lea.

処理部１８は、入力部１４から入力された入力文を構成
する文字列から辞書を検索する場合の単位となる検索キ
ー文字列を探索する。この検索キー文字列の探索は、入
力文を構成する文字列の最初の文字から順に所定の探索
ルールにより探索する。The processing unit 18 searches for a search key character string that is a unit when searching a dictionary from the character strings that constitute the input sentence input from the input unit 14 . The search for the search key string is performed in accordance with a predetermined search rule, starting from the first character of the string constituting the input sentence.

例えば、入力文をスペース、コンマ等のデリミツタによ
り文頭から順に区分し、区分された文字列をそれぞれ検
索キー文字列とする。この場合に■、　ｋｍ、　ｍｉｓ
等の単位を表す文字列はそれぞれこれらが検索キー文字
列とされる。処理部１Ｇは入力文を構成する文字列から
探索した検索キー文字列を辞書検索部２０に送る。For example, an input sentence is divided sequentially from the beginning of the sentence using delimiters such as spaces and commas, and each of the divided character strings is used as a search key character string. In this case ■, km, mis
Character strings representing units such as , etc. are each used as a search key character string. The processing unit 1G sends the search key character string searched from the character strings forming the input sentence to the dictionary search unit 20.

文字列に基づいて辞書ファイル２２を検索する。辞書フ
ァイル２２は、第２図に示すようにエントリおよび品詞
等の文法情報が記憶されている。辞書検索部２０は、辞
書ファイル２２にエントリがある場合にはそのエントリ
の品詞情報等を読み出し、これを処理部１Ｂに出力する
。辞書検索部２０は、辞書ファイル２２を検索した結果
、辞書ファイル２２にエントリがない場合にはその旨を
処理部１Ｂに出力する。The dictionary file 22 is searched based on the character string. As shown in FIG. 2, the dictionary file 22 stores entries and grammatical information such as parts of speech. If there is an entry in the dictionary file 22, the dictionary search unit 20 reads part-of-speech information, etc. of the entry, and outputs this to the processing unit 1B. As a result of searching the dictionary file 22, the dictionary search unit 20 outputs a notification to that effect to the processing unit 1B if there is no entry in the dictionary file 22.

処理部１Ｂは、辞書検索部２０により検索された品詞情
報等を辞書情報保存テーブルｌｅａに記憶する。処理部
１Ｂは、辞書ファイル２２に検索キー文字列のエントリ
がない場合には、その検索キー文字列を単位認ＩＩ部２
４に出力する。The processing unit 1B stores the part-of-speech information etc. searched by the dictionary search unit 20 in the dictionary information storage table lea. If there is no entry for the search key string in the dictionary file 22, the processing section 1B stores the search key string in the unit recognition II section 2.
Output to 4.

単位認識部２４は処理部１Ｂから送られた検索キー文字
列に基づいて基本単位辞書ファイル２Ｂを検索する。基
本単位辞書ファイル２８は、８８３図に示すように基本
単位エントリが記憶されている。単位認識部２４は、基
本単位辞書ファイル２Ｂに基本単位エントリがある場合
にはその基本単位エントリな読み出す、基本単位辞書フ
ァイル２Ｂにエントリがない場合には、後述するように
検索キー文字列を複数の文字列に分割して基本単位辞書
ファイル２Ｂを複数回検索し、複数回の検索において基
本単位辞書ファイル２Ｂにそれぞれ基本単位エントリが
ある場合には、これらの基本単位エントリから複合単位
情報を得る。複数回の検索においてそのいずれかに基本
単位エントリがない場合には、辞書未登録語である旨の
情報を得る。The unit recognition unit 24 searches the basic unit dictionary file 2B based on the search key character string sent from the processing unit 1B. The basic unit dictionary file 28 stores basic unit entries as shown in FIG. 883. If there is a basic unit entry in the basic unit dictionary file 2B, the unit recognition unit 24 reads out the basic unit entry. If there is no entry in the basic unit dictionary file 2B, the unit recognition unit 24 reads out a plurality of search key character strings as described later. search the basic unit dictionary file 2B multiple times, and if there are basic unit entries in the basic unit dictionary file 2B in the multiple searches, complex unit information is obtained from these basic unit entries. . If there is no basic unit entry in any of the multiple searches, information indicating that the word is not registered in the dictionary is obtained.

単位認ｉ！１ｆｆｉ２４は、基本単位エントリ、複合単
位情報および辞書未登録語である旨の情報を処理部ＩＢ
に出力する。処理部ＩＢは、単位認識部２４から入力さ
れたこれらの情報を辞書情報保存テーブルＩＥｌａに記
憶する。Unit recognition i! 1ffi24 sends the basic unit entry, composite unit information, and information indicating that the word is not registered in the dictionary to the processing unit IB.
Output to. The processing unit IB stores this information input from the unit recognition unit 24 in the dictionary information storage table IEla.

辞書情報保存テーブル１８ａは、第４図に示すように検
索キー文字列のエントリと、検索キー文字列について辞
書ファイル２２または基本単位辞書ファイル２６を検索
して得た品詞等の文法情報を記憶保存する。処理部１６
は、辞書情報保存テーブル１８ａにこれらのデータが記
憶された後、これらのデータを入力文とともに出力イン
ターフェース１Ｂに出力する。出力インターフェース１
８は処理部１Ｂから出力された入力文および形態素解析
のデータをプリンタ、ディスプレイ等の出力装置３０．
または磁気ディスク等の記憶ファイル３２に出力する。As shown in FIG. 4, the dictionary information storage table 18a stores and stores entries for search key strings and grammatical information such as parts of speech obtained by searching the dictionary file 22 or basic unit dictionary file 26 for the search key string. do. Processing section 16
After these data are stored in the dictionary information storage table 18a, these data are output to the output interface 1B together with the input sentence. Output interface 1
8 outputs the input sentence and morphological analysis data output from the processing unit 1B to an output device 30 such as a printer or a display.
Alternatively, it is output to a storage file 32 such as a magnetic disk.

または、出力インターフェース１８を設けることなく、
処理部１６から出力される入力文および形態素解析のデ
ータを直接構文解析手段（図示せず）に入力し、構文解
析手段において入力文の構文解析を行い、さらにその構
文解析に基づいて訳文を生成するようにしてもよい。Or, without providing the output interface 18,
The input sentence and morphological analysis data output from the processing unit 16 are directly input to a syntax analysis means (not shown), the input sentence is parsed by the syntax analysis means, and a translated text is generated based on the syntax analysis. You may also do so.

制御部２Ｂは、本装置の各機能部の動作を制御するもの
であり、マイクロプロセッサにより有利に構成される。The control section 2B controls the operation of each functional section of this device, and is advantageously configured by a microprocessor.

第５図に示すフローチャートにより、本装置の動作を説
明する。The operation of this device will be explained with reference to the flowchart shown in FIG.

まず、入力装置１０または入力文書ファイル１２から英
語の入力文を入力部１４に読み込む（１００）　、入力
部１４に読み込まれた入力文は入力文字列バッファ１４
ａに格納される。入力文字列バッファ１４ａに記憶され
た入力文は読み出されて処理部１８に出力される。First, an English input sentence is read into the input unit 14 from the input device 10 or the input document file 12 (100), and the input sentence read into the input unit 14 is transferred to the input character string buffer 14.
It is stored in a. The input sentence stored in the input character string buffer 14a is read out and output to the processing unit 18.

処理部１Ｂでは、入力文が入力されると、辞書引き単位
の切り出しが行われる（１０２）。すなわち、入力され
た入力文を構成する文字列は、所定のルールによって、
辞書ファイル２２または基本単位辞書ファイル２Ｂを検
索する場合の単位である検索キー文字列に、文字列の先
頭から順に分割される０分割された検索キー文字列があ
るか否かを判断しく１０４）　、ある場合には検索キー
文字列を辞書検索部２０に送る。In the processing unit 1B, when an input sentence is input, extraction is performed in dictionary lookup units (102). In other words, the character strings that make up the input sentence are
It is determined whether or not the search key string, which is the unit when searching the dictionary file 22 or the basic unit dictionary file 2B, has a search key string that is divided into zeros, which are divided sequentially from the beginning of the string (104). , in some cases, sends the search key string to the dictionary search unit 20.

辞書検索部２０に検索キー文字列が送られると、辞書検
索部２０はこの検索キー文字列について辞書ファイル２
２を検索する（ｔｏｅ）　’、第２図に示すような辞書
ファイル２２のエントリに検索キー文字列があるか否か
を判断しく１ｏ８）　、エントリがある場合には辞書フ
ァイル２２に記憶されている品詞等の文法情報を読み出
し、読み出したデータを処理部１６に送り、辞書情報保
存テーブル１６ａに記録する（１１０）　、その後、ス
テップ１０２に戻り、再び辞書引き単位の切り出しを行
う。When a search key string is sent to the dictionary search section 20, the dictionary search section 20 searches the dictionary file 2 for this search key string.
2), it is determined whether or not there is a search key character string in the entry of the dictionary file 22 as shown in FIG. The grammatical information, such as the part of speech, is read out, and the read data is sent to the processing unit 16 and recorded in the dictionary information storage table 16a (110).Then, the process returns to step 102 and the dictionary lookup unit is cut out again.

辞書ファイル２２にエントリがない場合には、辞書検索
部２０は検索キー文字列を処理部１Ｂに送り返し、処理
部１Ｂはこの検索キー文字列を単位認識部２４に送り、
単位認識部２４において単位の認識を行う（１１２）　
。If there is no entry in the dictionary file 22, the dictionary search section 20 sends the search key string back to the processing section 1B, the processing section 1B sends this search key string to the unit recognition section 24,
Unit recognition is performed in the unit recognition unit 24 (112)
.

辞書検索部２０に送られた検索キー文字列が通常の名詞
、動詞等の単語である場合には殆ど辞書ファイル２２の
エントリがあるから、辞書ファイル２２から品詞等の文
法情報を読み出し、このデータを処理部１Ｂに送り、辞
書情報保存テーブルｔｅａに記録する。辞書ファイル２
２は上記のように通常の名詞、動詞等の単語のエントリ
が形成され、単位を表す文字列のエントリは形成されて
いない、したがって、検索キー文字列がｋｍ、■／Ｓ等
の単位を表す文字列である場合には、辞書ファイル２２
のエントリがないから、ステップ１１２に進み、単位の
認識を行う。When the search key string sent to the dictionary search unit 20 is a word such as a normal noun or verb, there are almost always entries in the dictionary file 22, so grammatical information such as part of speech is read from the dictionary file 22 and this data is is sent to the processing unit 1B and recorded in the dictionary information storage table tea. Dictionary file 2
2, as mentioned above, entries for ordinary words such as nouns and verbs are formed, but entries for character strings representing units are not formed. Therefore, the search key string represents units such as km, ■/S, etc. If it is a character string, the dictionary file 22
Since there is no entry, the process proceeds to step 112, where the unit is recognized.

ステップ１１２の単位の認識の動作について、第６図に
より説明する。The unit recognition operation in step 112 will be explained with reference to FIG.

辞書ファイル２２の検索において、辞書ファイル２２に
エントリの存在しなかった検索キー文字列が処理部１Ｂ
から単位認識部２４に送られると、単位認識部２４にお
いて検索キー文字列の先頭の文字にポインタＰをセット
する（２００）　。In the search of the dictionary file 22, the search key string for which no entry exists in the dictionary file 22 is searched by the processing unit 1B.
When the search key character string is sent to the unit recognition unit 24, the unit recognition unit 24 sets a pointer P to the first character of the search key character string (200).

次に、単位認識部２４は、ポインタＰがセットされてい
る文字から始まる文字列について基本単位辞書ファイル
２６を検索する（２０２）。この検索は、基本単位辞書
ファイル２Ｂにエントリの存在する基本単位が、ポイン
タＰのセットされた文字から始まる文字列中に完全な文
字列として現れ、かつポインタＰのセットされた文字を
始点としているか否かを検索する。すなわち、この検索
はポインタＰがセットされている文字から始まる１文字
ないし複数文字の文字列が、基本単位辞書ファイル２Ｂ
にエントリの存在する基本単位のいずれかと一致するか
否かを検索する。例えば、ポインタＰがセットされてい
る文字が、ｋ、薦、Ｓ等の場合にｌ乙は、ポインタＰがセットされている文字から始まるこれ
らの１文字について、第３図に示すように基本単位辞書
ファイル２６にエントリが存在する。Next, the unit recognition unit 24 searches the basic unit dictionary file 26 for a character string starting from the character to which the pointer P is set (202). This search is performed to determine whether the basic unit whose entry exists in the basic unit dictionary file 2B appears as a complete character string in a character string starting from the character set by pointer P, and whether the character set by pointer P is the starting point. Search for whether or not. That is, in this search, a character string of one or more characters starting from the character to which the pointer P is set is found in the basic unit dictionary file 2B.
Search to see if it matches any of the basic units in which an entry exists. For example, if the character to which the pointer P is set is k, sho, S, etc., the basic unit for these characters starting from the character to which the pointer P is set is shown in Figure 3. An entry exists in the dictionary file 26.

単位認識部２４は、基本単位辞書ファイル２Ｂの検索の
結果、基本単位辞書ファイル２Ｂ中にエントリが存在す
るか否かを判断しく２０４）　、エントリが存在する場
合には、認識した基本単位の長さ分だけポインタＰを進
める（２０８）。したがって、基本単位がｋ、鳳、Ｓ等
の場合には、ポインタＰを１文字分進め、検索キー文字
列内の次の文字にセットする。The unit recognition unit 24 determines whether or not an entry exists in the basic unit dictionary file 2B as a result of searching the basic unit dictionary file 2B (204), and if the entry exists, the length of the recognized basic unit is determined. The pointer P is advanced by that amount (208). Therefore, when the basic unit is k, otori, S, etc., the pointer P is advanced by one character and set to the next character in the search key character string.

単位認識部２４は、ポインタＰがセットされている文字
から始まる文字列がさらに存在するか否かを判断する（
２０８）。このような文字列がさらに存在する場合には
、ステップ２０２に戻り、ポインタＰがセットされてい
る文字から始まる文字列で再び基本単位辞書ファイル２
８を検索する。そして、基本単位辞書ファイル２６の検
索の結果、基本単位中にエントリが存在するか否かを判
断しく２０４）、エントリが存在する場合には、認識し
た基本単位の長さ分だけポインタＰを進める。The unit recognition unit 24 determines whether there are any further character strings starting from the character to which pointer P is set (
208). If there are more character strings like this, the process returns to step 202, and the basic unit dictionary file 2 is rewritten with the character string starting from the character to which the pointer P is set.
Search for 8. Then, as a result of searching the basic unit dictionary file 26, it is determined whether an entry exists in the basic unit (204), and if an entry exists, the pointer P is advanced by the length of the recognized basic unit. .

ステップ２０８において、ポインタＰがセットされてい
る文字から始まる文字列がもう存在しない場合には、基
本単位辞書ファイル２Ｂの検索が終了し、複合単位の認
識に成功したことになる。In step 208, if there is no longer a character string starting from the character to which the pointer P is set, it means that the search in the basic unit dictionary file 2B has been completed and the recognition of the compound unit has been successful.

例えば単位認識部２４に送られた検索キー文字列が単位
を表すｋｍ／ｓである場合には、このに■／ｓ目体は複
雑な単位であるため、基本単位辞書ファイル２Ｂにエン
トリが存在しない。そこで、最初にポインタＰをｋにセ
ットしく２００）　、　ｋを基本単位辞書ファイル２８
により検索してエントリの存在を確認する（２０２）。For example, if the search key string sent to the unit recognition unit 24 is km/s, which represents the unit, an entry exists in the basic unit dictionary file 2B because the unit is a complex unit. do not. Therefore, first set the pointer P to k200), and set k to the basic unit dictionary file28.
The existence of the entry is confirmed by searching (202).

次に、ポインタＰをｌにセットしく２０Ｂ）　、層を基
本単位辞書ファイル２８により検索して（２０２）、同
様にエントリの存在を確認する。単位認識部２４は、ス
ラッシュｌ、中黒・等を単位の一部とみなすので、次に
ｋｍ／ｓ中のｌをとばしてポインタＰをＳにセットする
（２Ｈ）。そしてＳを基本単位辞書ファイル２６により
検索して同様にエントリの存在を確認する（、２０２）
　、これらの結果、ｋ、ｍ、およびＳのいずれも基本単
位辞書ファイル２Ｂの検索によりエントリが存在したの
で、ｋ−／Ｓは単位を表す文字列であると判断される。Next, the pointer P is set to l (20B), the layer is searched by the basic unit dictionary file 28 (202), and the existence of the entry is similarly confirmed. Since the unit recognition unit 24 considers slash l, bullets, etc. to be part of the unit, next it skips l in km/s and sets pointer P to S (2H). Then, search for S using the basic unit dictionary file 26 and confirm the existence of the entry in the same manner (202).
, As a result, since entries for all of k, m, and S were found by searching the basic unit dictionary file 2B, it is determined that k-/S is a character string representing a unit.

このように、検索キー文字列を構成するすべての文字に
ついて基本単位辞書ファイル２Ｂにエントリが存在する
場合、。In this way, when entries exist in the basic unit dictionary file 2B for all the characters that make up the search key character string.

またはスラッシュ、中黒等の単位の一部とみなされる記
号を除いたすべての文字について基本単位辞書ファイル
２Ｂにエントリが存在する場合に、その検索キー文字列
は単位を表す文字列であると判断される。Or, if there are entries in the basic unit dictionary file 2B for all characters other than symbols that are considered to be part of units such as slashes and bullets, the search key string is determined to be a string representing a unit. be done.

単位認識部２４は、基本単位辞書ファイル２６の検索を
終了し、複合単位の認識に成功すると、得られた単位情
報を処理部１Ｂに送り、辞書情報保存テーブル１８ａに
格納する（２１０）。これにより単位の認識が終了する
。When the unit recognition unit 24 finishes searching the basic unit dictionary file 26 and successfully recognizes the composite unit, it sends the obtained unit information to the processing unit 1B and stores it in the dictionary information storage table 18a (210). This completes unit recognition.

ステップ２０４において、ポインタＰがセットされてい
る文字から始まる文字列についての基本単位辞書ファイ
ル２８の検索の結果、基本単位辞書ファイル２Ｂ中にエ
ントリが存在しない場合には、ｌにの文字列を基本単位または複合単位として認識すること
ができなかったことになるので、単位認識部２４はこの
文字列が辞書未登録語であるという情報、すなわち単位
を表すものではないという情報を処理部１Ｂに送り、処
理部１８の辞書情報保存テーブルｌｅａに保存すること
により（２１２）　、　中位の認識が終了する。In step 204, as a result of searching the basic unit dictionary file 28 for a character string starting from the character to which the pointer P is set, if there is no entry in the basic unit dictionary file 2B, the character string in l is used as the basic unit dictionary file 28. Since this means that it could not be recognized as a unit or a compound unit, the unit recognition section 24 sends information that this character string is a word not registered in the dictionary, that is, information that it does not represent a unit, to the processing section 1B. , by storing it in the dictionary information storage table lea of the processing unit 18 (212), the intermediate recognition is completed.

第５図に戻って、単位の認識（１１２）が終了すると、
ステップ１０２に戻り、再び処理部１６による辞書引き
単位の切り出しが行われる。Returning to FIG. 5, when the unit recognition (112) is completed,
Returning to step 102, the processing section 16 performs the extraction of each dictionary lookup unit again.

辞書引き単位の切り出しの後、処理部１６は切り出した
単位がまだあるか否かを判断しく１０４）　、切り出し
た単位、すなわち検索キー文字列がもうない場合には、
辞書情報保存テーブルｌｅａに記憶されている情報を、
出力インターフェース１８を通して出力装置３０に出力
する（１１４）。これにより、入力文の解析が終了する
。After extracting the dictionary lookup units, the processing unit 16 determines whether there are any more extracted units (104), and if there are no more extracted units, that is, search key strings,
The information stored in the dictionary information storage table lea,
It is output to the output device 30 through the output interface 18 (114). This completes the analysis of the input sentence.

以上のように本実施例によれば、英語の入力文を検索キ
ー文字列に分割して、まず通常の辞書ファイル２２によ
り検索し、辞書ファイル２２にエントリがない場合に単
位の認識を行う。単位の認識においては、検索キー文字
列を分割してポインタＰにより指示し、分割された文字
列ごとに基本単位辞書ファイル２Ｂを検索し、基本単位
辞書ファイル２８に記録されているもの、または基本単
位辞書ファイル２６に記録されているものの連続からな
るものを、単位を表す文字列と判断する。As described above, according to this embodiment, an English input sentence is divided into search key character strings, first searched using the normal dictionary file 22, and when there is no entry in the dictionary file 22, unit recognition is performed. In unit recognition, the search key string is divided and indicated by pointer P, the basic unit dictionary file 2B is searched for each divided string, and the basic unit dictionary file 2B is searched for the one recorded in the basic unit dictionary file 28 or the basic unit. A string consisting of consecutive items recorded in the unit dictionary file 26 is determined to be a character string representing a unit.

したがって、複雑な単位を表す文字列であっても、基本
単位辞書ファイル２６に記憶された基本単゛位を組み合
わせることにより単位の認識を行うことができるから、
多様な単位表現に対応して解析を行づことができる。し
かも、基本単位辞書ファイル２Ｂに′は基某的な単位の
み、例えばに、＋ｗ、ｓ等のみを記憶しておけばよく、
これらを組み合わせた複雑な単位、例えばに■、に■／
Ｓ等を記憶しておく必要がないため、辞書ファイルの容
量を少なくすることができる。Therefore, even if the character string represents a complex unit, the unit can be recognized by combining the basic units stored in the basic unit dictionary file 26.
Analysis can be performed in response to a variety of unit expressions. Moreover, '' only needs to be stored in the basic unit dictionary file 2B, such as only certain basic units, such as +w, s, etc.
Complex units that combine these, such as ni■, ni■/
Since there is no need to store S, etc., the capacity of the dictionary file can be reduced.

効果本発明によれば、入力された文字列について基本辞書手
段を検索し、文字列の一部が検索された場合に、文字列
の他の部分について同様に基本辞書手段を検索し、文字
列を解析する。したがって、基本辞書手段に多くのデー
タを記憶しておくことなく、多種類の文字列について解
析を行うことができる。Effects According to the present invention, the basic dictionary means is searched for an input character string, and when a part of the character string is retrieved, the basic dictionary means is similarly searched for other parts of the character string, and the character string is searched. Analyze. Therefore, many types of character strings can be analyzed without storing a large amount of data in the basic dictionary means.

[Brief explanation of the drawing]

第１図は本発明による言語解析装置の一実施例を示すブ
ロック図、第２図は第１図の辞書ファイルに記憶されるデータの一
例を示す図、第３図は第１図の基本単位辞書ファイルに記憶されるデ
ータの一例を示す図、第４図は第１図の辞書情報保存テーブルに記憶されるデ
ータの一例を示す図、第５図は第１図の装置の動作を示すフローチャート。第６図は５８５図に示す動作のうち単位の認識を示すフ
ローチャートである。１８、、、処理部２０、、、辞書検索部２２、、、辞書ファイル２４、、、単位認識部Figure 1 is a block diagram showing an embodiment of the language analysis device according to the present invention, Figure 2 is a diagram showing an example of data stored in the dictionary file of Figure 1, and Figure 3 is the basic unit of Figure 1. FIG. 4 is a diagram showing an example of data stored in the dictionary information storage table of FIG. 1. FIG. 5 is a flowchart showing the operation of the device of FIG. 1. . FIG. 6 is a flowchart showing unit recognition in the operation shown in FIG. 585. 18. Processing unit 20, Dictionary search unit 22, Dictionary file 24, Unit recognition unit

Claims

[Scope of Claims] 1. Input means for inputting a character string in a predetermined language; basic dictionary means used for searching the input character string and storing basic data; and the input characters. an analysis means for analyzing the character string by searching the basic dictionary means for the input character string, the analysis means analyzing the character string by searching the basic dictionary means for the input character string. 2. A language analysis device characterized in that when a part of the character string is retrieved, the basic dictionary means is similarly searched for other parts of the character string to analyze the character string. 2. In the device according to claim 1, the basic dictionary means is a basic unit dictionary means for storing data representing units, and the analysis means is configured to analyze the input character string from the basic unit dictionary. A language analysis device characterized in that it analyzes whether or not the character string represents a unit by searching for a means. 3. In the apparatus according to claim 2, the analysis means searches the basic unit dictionary means, and, as a result of searching the basic unit dictionary means, the character strings are combinations of character strings representing units stored in the basic unit dictionary means. A language analysis device characterized in that a character string is determined to be a character string representing a unit when the character string consists only of characters. 4. In the device according to any one of claims 1 to 3, the analysis means has a pointer,
When a part of the character string is retrieved by setting the pointer to the first character of the input character string and searching the basic dictionary means for a character string starting from the character to which the pointer is set. , a language analysis device characterized in that the pointer is set to a character string following a part of the searched character string, and the basic dictionary means is searched for the character string to which the pointer is set. 5. In the device according to any one of claims 1 to 4, the input character string is searched by a normal dictionary means and is not stored in the dictionary means. A language analysis device characterized by: