JPH0821031B2

JPH0821031B2 - Language analyzer

Info

Publication number: JPH0821031B2
Application number: JP61234328A
Authority: JP
Inventors: 壽彦横川
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-10-03
Filing date: 1986-10-03
Publication date: 1996-03-04
Anticipated expiration: 2011-03-04
Also published as: JPS6389976A

Description

【発明の詳細な説明】技術分野本発明は言語解析装置、特に自動翻訳装置に有用な言
語解析装置に関する。TECHNICAL FIELD The present invention relates to a language analysis device, and more particularly to a language analysis device useful for an automatic translation device.

従来技術例えば英語等の外国語の文からそれに対応する日本語
の文を作成する場合、入力された英文の形態素を解析
し、その構文を解析し、その文構造を変換し、その後に
日本語の訳文を生成する。Conventional technology For example, when creating a Japanese sentence corresponding to a foreign language sentence such as English, the morpheme of the input English sentence is analyzed, its syntax is analyzed, the sentence structure is converted, and then the Japanese sentence is converted. Generate a translation of.

すなわち、辞書を検索することによって入力文を構成
する各単語等の形態素を解析し、これらの形態素につい
ての品詞等の情報を得る。その後、得られた品詞等の情
報に基づいて、各々の単語またはこれらの単語により形
成されるブロック相互の修飾関係を文法ルールにより解
析することにより、入力文の構造の解析、すなわち構文
解析を行う。さらに、解析された構文に基づいて入力文
の文構造を日本語の構文配列の順序に変換し、変換され
た配列に従って日本語の形態素を生成し、日本語文を生
成する。That is, morphemes such as words constituting an input sentence are analyzed by searching a dictionary, and information such as a part of speech about these morphemes is obtained. After that, based on the obtained information such as the part of speech, the structure of the input sentence is analyzed, that is, the syntactic analysis is performed, by analyzing each word or the modification relation between blocks formed by these words by the grammar rule. . Furthermore, the sentence structure of the input sentence is converted into the order of the Japanese syntax array based on the analyzed syntax, and the Japanese morphemes are generated according to the converted array to generate the Japanese sentence.

このような訳文生成の際に行われる形態素解析におい
ては、辞書を検索することによって単語等の形態素の品
詞その他の情報を得ている。通常の名詞、動詞等の単語
の場合にはその多くを辞書に格納しておくことができる
から、容易に検索され、情報を得ることができる。In such a morphological analysis performed when generating a translated sentence, a dictionary is searched to obtain information such as the part of speech of a morpheme such as a word. In the case of words such as ordinary nouns and verbs, many of them can be stored in a dictionary, so that they can be easily searched and information can be obtained.

しかし、例えば長さ、速度、加速度その他の単位を表
す表現は非常に多くの種類が存在するため、これらをす
べて辞書に格納しておくことは辞書情報の記憶容量をい
たずらに大きくすることになり、非能率的である。これ
らの単位は、例えばm/s、km/s等のように、単位を表す
表現を複数組み合わせた複合的な表現のものが多いため
である。However, there are many types of expressions that represent units such as length, velocity, acceleration, etc., so storing all of these in a dictionary would unnecessarily increase the storage capacity of dictionary information. , Inefficient. This is because many of these units are complex expressions such as m / s, km / s, etc., in which a plurality of expressions representing units are combined.

ところがこれらの単位の一部のみを辞書に記憶してお
く装置の場合には入力文に含まれるこれらの複合的な表
現の単位の情報を得ることができないため、形態素解析
を行うことができず、誤った言語解析を行う恐れがあっ
た。However, in the case of a device that stores only a part of these units in the dictionary, it is not possible to obtain information on the units of these complex expressions included in the input sentence, and therefore morphological analysis cannot be performed. , There was a risk of performing incorrect language analysis.

目的本発明はこのような従来技術の欠点を解消し、複合的
な表現からなる文字列のすべてを辞書に記憶しておくこ
となしに、このような文字列を含む入力文の形態素解析
を行うことのできる言語解析装置を提供することを目的
とする。Aim The present invention solves the above-mentioned drawbacks of the prior art, and enables morphological analysis of an input sentence containing such a character string without storing all the character strings consisting of complex expressions in a dictionary. It is an object of the present invention to provide a linguistic analysis device capable of performing.

構成本発明は上記の目的を達成させるため、所定の言語の
文字列を入力する入力手段と、入力された文字列の検索
に用いられ、基本的なデータを記憶する基本辞書手段
と、入力された文字列について基本辞書手段を検索する
ことにより文字列を解析する解析手段とを有し、解析手
段は、入力された文字列について基本辞書手段を検索す
ることにより文字列の一部が検索された場合に、文字列
の他の部分について同様に基本辞書手段を検索すること
により、文字列を解析することを特徴としたものであ
る。以下、本発明の一実施例に基づいて具体的に説明す
る。In order to achieve the above-mentioned object, the present invention comprises an input means for inputting a character string in a predetermined language, a basic dictionary means used for searching the input character string and storing basic data, and an input means. And a analyzing unit for analyzing the character string by searching the basic dictionary unit for the input character string. The analyzing unit searches for a part of the character string by searching the basic dictionary unit for the input character string. In this case, the character string is analyzed by similarly searching the basic dictionary means for the other part of the character string. Hereinafter, a specific description will be given based on an embodiment of the present invention.

第１図には、本発明による言語解析装置を英日自動翻
訳装置に適用した一実施例が示されている。なお本発明
は、英語の入力文の形態素解析以外にも用いることがで
き、英語を日本語に翻訳する英日翻訳装置のみならず、
ある１つの言語を他の言語に翻訳する自動翻訳装置にも
効果的に適用されることは、言うまでもない。FIG. 1 shows an embodiment in which the language analysis device according to the present invention is applied to an English-Japanese automatic translation device. Note that the present invention can be used in addition to morphological analysis of an English input sentence, and is not limited to an English-Japanese translation device that translates English into Japanese.
It goes without saying that the present invention can be effectively applied to an automatic translation device that translates one language into another language.

本実施例は入力部14を有し、入力部14には入力装置10
または入力文書ファイル12からデータが入力される。入
力装置10は例えば、英数字キー等の文字キーや機能キー
等を有するキーボード、紙に記録された英字テキストを
読み取る光学的文字読み取り装置等を含む。入力文書フ
ァイル12は、磁気ディスク等の記憶媒体に英字テキスト
を記録した記憶装置である。This embodiment has an input unit 14, and the input unit 14 includes an input device 10
Alternatively, data is input from the input document file 12. The input device 10 includes, for example, a keyboard having character keys such as alphanumeric keys and function keys, an optical character reading device for reading English text recorded on paper, and the like. The input document file 12 is a storage device in which English text is recorded on a storage medium such as a magnetic disk.

入力部14は入力文字列バッファ14aを有し、入力装置1
0または入力文書ファイル12から入力された英語の入力
文を入力文字列バッファ14aに記憶する。入力部14は入
力文字列バッファ14aに記憶された入力文を読み出して
処理部16に出力する。The input unit 14 has an input character string buffer 14a, and the input device 1
0 or an English input sentence input from the input document file 12 is stored in the input character string buffer 14a. The input unit 14 reads the input sentence stored in the input character string buffer 14a and outputs it to the processing unit 16.

処理部16は、辞書ファイルの検索によって、入力部14
から送られた入力文の形態素解析を行う機能部である。
処理部16は辞書情報保存テーブル16aを有し、後述する
辞書ファイル22また基本単位辞書ファイル26を検索して
得た情報を辞書情報保存テーブル16aに記憶する。The processing unit 16 searches the dictionary file and then the input unit 14
It is a functional unit that performs morphological analysis of the input sentence sent from.
The processing unit 16 has a dictionary information storage table 16a, and stores information obtained by searching a dictionary file 22 or a basic unit dictionary file 26 described later in the dictionary information storage table 16a.

処理部16は、入力部14から入力された入力文を構成す
る文字列から辞書を検索する場合の単位となる検索キー
文字列を検索する。この検索キー文字列の探索は、入力
文を構成する文字列の最初の文字から順に所定の探索ル
ールにより探索する。例えば、入力文をスペース、コマ
ン等のデリミッタにより文頭から順に区分し、区分され
た文字列をそれぞれ検索キー文字列とする。この場合に
ｍ、km、m/s等の単位を表す文字列はそれぞれこれらが
検索キー文字列とされる。処理部16は入力文を構成する
文字列から探索した検索キー文字列を辞書検索部20に送
る。The processing unit 16 searches for a search key character string that is a unit when searching the dictionary from the character strings that form the input sentence input from the input unit 14. The search key character string is searched for by a predetermined search rule in order from the first character of the character string forming the input sentence. For example, the input sentence is divided in order from the beginning of the sentence by a delimiter such as space or command, and the divided character strings are used as search key character strings. In this case, the character strings representing the units of m, km, m / s, etc. are respectively used as the search key character strings. The processing unit 16 sends the search key character string searched for from the character string forming the input sentence to the dictionary search unit 20.

辞書検索部20は処理部16から送られた検索キー文字列
に基づいて辞書ファイル22を検索する。辞書ファイル22
は、第２図に示すようにエントリおよび品詞等の文法情
報が記憶されている。辞書検索部20は、辞書ファイル22
にエントリがある場合にはそのエントリの品詞情報等を
読み出し、これを処理部16に出力する。辞書検索部20
は、辞書ファイル22を検索した結果、辞書ファイル22に
エントリがない場合にはその旨を処理部16に出力する。The dictionary search unit 20 searches the dictionary file 22 based on the search key character string sent from the processing unit 16. Dictionary file 22
As shown in FIG. 2, entry and grammatical information such as a part of speech are stored. The dictionary search unit 20 uses the dictionary file 22.
If there is an entry in, the part-of-speech information of the entry is read and output to the processing unit 16. Dictionary search unit 20
As a result of searching the dictionary file 22, when there is no entry in the dictionary file 22, outputs to that effect to the processing unit 16.

処理部16は、辞書検索部20により検索された品詞情報
等を辞書情報保存テーブル16aに記憶する。処理部16
は、辞書ファイル22に検索キー文字列のエントリがない
場合には、その検索キー文字列を単位認識部24に出力す
る。The processing unit 16 stores the part-of-speech information and the like retrieved by the dictionary retrieval unit 20 in the dictionary information storage table 16a. Processing unit 16
If the dictionary file 22 does not have an entry for the search key character string, it outputs the search key character string to the unit recognition unit 24.

単位認識部24は処理部16から送られた検索キー文字列
に基づいて基本単位辞書ファイル26を検索する。基本単
位辞書ファイル26は、第３図に示すように基本単位エン
トリが記憶されている。単位認識部24は、基本単位辞書
ファイル26に基本単位エントリがある場合にはその基本
単位エントリを読み出す。基本単位辞書ファイル26にエ
ントリがない場合には、後述するように検索キー文字列
を複数の文字列に分割して基本単位辞書ファイル26を複
数回検索し、複数回の検索において基本単位辞書ファイ
ル26にそれぞれ基本単位エントリがある場合には、これ
らの基本単位エントリから複合単位情報を得る。複数回
の検索においてそのいずれかに基本単位エントリがない
場合には、辞書未登録語である旨の情報を得る。The unit recognition unit 24 searches the basic unit dictionary file 26 based on the search key character string sent from the processing unit 16. The basic unit dictionary file 26 stores basic unit entries as shown in FIG. If the basic unit dictionary file 26 has a basic unit entry, the unit recognition unit 24 reads the basic unit entry. If there is no entry in the basic unit dictionary file 26, the search key character string is divided into a plurality of character strings as described later, and the basic unit dictionary file 26 is searched multiple times. If each 26 has a basic unit entry, the composite unit information is obtained from these basic unit entries. If there is no basic unit entry in any of the multiple searches, information indicating that the dictionary is not registered is obtained.

単位認識部24は、基本単位エントリ、複合単位情報お
よび辞書未登録語である旨の情報を処理部16に出力す
る。処理部16は、単位認識部24から入力されたこれらの
情報を辞書情報保存テーブル16aに記憶する。The unit recognition unit 24 outputs the basic unit entry, the composite unit information, and the information indicating that the dictionary is not registered to the processing unit 16. The processing unit 16 stores these pieces of information input from the unit recognition unit 24 in the dictionary information storage table 16a.

辞書情報保存テーブル16aは、第４図に示すように検
索キー文字列のエントリと、検索キー文字列について辞
書ファイル22または基本単位辞書ファイル26を検索して
得た品詞等の文法情報を記憶保存する。処理部16は、辞
書情報保存テーブル16aにこれらのデータが記憶された
後、これらのデータを入力文とともに出力インターフェ
ース18に出力する。出力インターフェース18は処理部16
から出力された入力文および形態素解析のデータをプリ
ンタ、ディスプレイ等の出力装置30、または磁気ディス
ク等の記憶ファイル32に出力する。As shown in FIG. 4, the dictionary information storage table 16a stores and stores the entry of the search key character string and the grammatical information such as the part of speech obtained by searching the dictionary file 22 or the basic unit dictionary file 26 for the search key character string. To do. After the data is stored in the dictionary information storage table 16a, the processing unit 16 outputs the data together with the input sentence to the output interface 18. The output interface 18 is the processing unit 16
The input sentence and the morphological analysis data output from are output to an output device 30 such as a printer or a display, or to a storage file 32 such as a magnetic disk.

または、出力インターフェース18を設けることなく、
処理部16から出力される入力文および形態素解析のデー
タを直接構文解析手段（図示せず）に入力し、構文解析
手段において入力文の構文解析を行い、さらにその構文
解析に基づいて訳文を生成するようにしてもよい。Or without providing the output interface 18,
The input sentence and morphological analysis data output from the processing unit 16 are directly input to the syntactic analysis unit (not shown), the syntactic analysis unit performs syntactic analysis of the input sentence, and the translated sentence is generated based on the syntactic analysis. You may do it.

制御部28は、本装置の各機能部の動作を制御するもの
であり、マイクロプロセッサにより有利に構成される。The control unit 28 controls the operation of each functional unit of the present device, and is advantageously configured by a microprocessor.

第５図に示すフローチャートにより、本装置の動作を
説明する。The operation of this device will be described with reference to the flowchart shown in FIG.

まず、入力装置10または入力文書ファイル12から英語
の入力文を入力部14に読み込む（100）。入力部14に読
み込まれた入力文は入力文字列バッファ14aに格納され
る。入力文字列バッファ14aに記憶された入力文は読み
出されて処理部16に出力される。First, an English input sentence is read into the input unit 14 from the input device 10 or the input document file 12 (100). The input sentence read by the input unit 14 is stored in the input character string buffer 14a. The input sentence stored in the input character string buffer 14a is read and output to the processing unit 16.

処理部16では、入力文が入力されると、辞書引き単位
の切り出しが行われる（102）。すなわち、入力された
入力文を構成する文字列は、所定のルールによって、辞
書ファイル22または基本単位辞書ファイル26を検索する
場合の単位である検索キー文字列に、文字列の先頭から
順に分割される。分割された検索キー文字列があるか否
かを判断し（104）、ある場合には検索キー文字列を辞
書検索部20に送る。When the input sentence is input, the processing unit 16 cuts out a dictionary lookup unit (102). That is, the character string forming the input sentence that is input is divided into search key character strings, which are units when searching the dictionary file 22 or the basic unit dictionary file 26, in order from the beginning of the character string according to a predetermined rule. It It is determined whether there is a divided search key character string (104), and if there is, the search key character string is sent to the dictionary search unit 20.

辞書検索部20に検索キー文字列が送られると、辞書検
索部20はこの検索キー文字列について辞書ファイル22を
検索する（106）。第２図に示すような辞書ファイル22
のエントリに検索キー文字列があるか否かを判断し（10
8）、エントリがある場合には辞書ファイル22に記憶さ
れている品詞等の文法情報を読み出し、読み出したデー
タを処理部16に送り、辞書情報保存テーブル16aに記録
する（110）。その後、ステップ102に戻り、再び辞書引
き単位の切り出しを行う。When the search key character string is sent to the dictionary search unit 20, the dictionary search unit 20 searches the dictionary file 22 for this search key character string (106). Dictionary file 22 as shown in Fig. 2
Determines whether the search key string exists in the entry of (10
8) If there is an entry, the grammatical information such as a part of speech stored in the dictionary file 22 is read, the read data is sent to the processing unit 16, and recorded in the dictionary information storage table 16a (110). Then, the process returns to step 102 and the dictionary unit is cut out again.

辞書ファイル22にエントリがない場合には、辞書検索
部20は検索キー文字列を処理部16に送り返し、処理部16
はこの検索キー文字列を単位認識部24に送り、単位認識
部24において単位の認識を行う（112）。If there is no entry in the dictionary file 22, the dictionary search unit 20 returns the search key character string to the processing unit 16, and the processing unit 16
Sends the search key character string to the unit recognition unit 24, and the unit recognition unit 24 recognizes the unit (112).

辞書検索部20に送られた検索キー文字列が通常の名
詞、動詞等の単語である場合には殆ど辞書ファイル22の
エントリがあるから、辞書ファイル22から品詞等の文法
情報を読み出し、このデータを処理部16に送り、辞書情
報保存テーブル16aに記録する。辞書ファイル22は上記
のように通常の名詞、動詞等の単語のエントリが形成さ
れ、単位を表す文字列のエントリは形成されていない。
したがって、検索キー文字列がkm、m/s等の単位を表す
文字列である場合には、辞書ファイル22のエントリがな
いから、ステップ112に進み、単位の認識を行う。If the search key character string sent to the dictionary search unit 20 is a word such as a normal noun or verb, there is almost an entry in the dictionary file 22, so grammatical information such as a part of speech is read from the dictionary file 22, and this data Is sent to the processing unit 16 and recorded in the dictionary information storage table 16a. As described above, the dictionary file 22 has entries of words such as ordinary nouns and verbs, and does not have entries of character strings representing units.
Therefore, if the search key character string is a character string representing a unit such as km or m / s, there is no entry in the dictionary file 22, so the process proceeds to step 112 and unit recognition is performed.

ステップ112の単位の認識の動作について、第６図に
より説明する。The unit recognition operation in step 112 will be described with reference to FIG.

辞書ファイル22の検索において、辞書ファイル22にエ
ントリの存在しなかった検索キー文字列が処理部16から
単位認識部24に送られると、単位認識部24において検索
キー文字列の先頭の文字にポインタＰをセットする（20
0）。In the search of the dictionary file 22, when the search key character string having no entry in the dictionary file 22 is sent from the processing unit 16 to the unit recognition unit 24, the unit recognition unit 24 points to the first character of the search key character string. Set P (20
0).

次に、単位認識部24は、ポインタＰがセットされてい
る文字から始まる文字列について基本単位辞書ファイル
26を検索する（202）。この検索は、基本単位辞書ファ
イル26にエントリの存在する基本単位が、ポインタＰの
セットされた文字から始まる文字列中に完全な文字列と
して現れ、かつポインタＰのセットされた文字を始点と
しているか否かを検索する。すなわち、この検索はポイ
ンタＰがセットされている文字から始まる１文字ないし
複数文字の文字列が、基本単位辞書ファイル26にエント
リの存在する基本単位のいずれかと一致するか否かを検
索する。例えば、ポインタＰがセットされている文字が
ｋ、ｍ、ｓ等の場合には、ポインタＰがセットされてい
る文字から始まるこれらの１文字について、第３図に示
すように基本単位辞書ファイル26にエントリが存在す
る。Next, the unit recognition unit 24 determines the basic unit dictionary file for the character string starting from the character to which the pointer P is set.
Search for 26 (202). In this search, the basic unit having an entry in the basic unit dictionary file 26 appears as a complete character string in the character string starting from the character set by the pointer P, and the starting point is the character set by the pointer P. Search whether or not. That is, this search searches whether or not the character string of one or more characters starting from the character to which the pointer P is set matches any of the basic units having an entry in the basic unit dictionary file 26. For example, when the character to which the pointer P is set is k, m, s, etc., the basic unit dictionary file 26 as shown in FIG. There is an entry in.

単位認識部24は、基本単位辞書ファイル26の検索の結
果、基本単位辞書ファイル26中にエントリが存在するか
否かを判断し（204）、エントリが存在する場合には、
認識した基本単位の長さ分だけポインタＰを進める（20
8）。したがって、基本単位がｋ、ｍ、ｓ等の場合に
は、ポインタＰを１文字分進め、検索キー文字列内の次
の文字にセットする。As a result of the search of the basic unit dictionary file 26, the unit recognition unit 24 determines whether or not there is an entry in the basic unit dictionary file 26 (204), and if there is an entry,
The pointer P is advanced by the length of the recognized basic unit (20
8). Therefore, when the basic unit is k, m, s, etc., the pointer P is advanced by one character and set to the next character in the search key character string.

単位認識部24は、ポインタＰがセットされている文字
から始まる文字列がさらに存在するか否かを判断する
（208）。このような文字列がさらに存在する場合に
は、ステップ202に戻り、ポインタＰがセットされてい
る文字から始まる文字列で再び基本単位辞書ファイル26
を検索する。そして、基本単位辞書ファイル26の検索の
結果、基本単位中にエントリが存在するか否かを判断し
（204）、エントリが存在する場合には、認識した基本
単位の長さ分だけポインタＰを進める。The unit recognition unit 24 determines whether or not there is a character string starting from the character to which the pointer P is set (208). If such a character string is further present, the process returns to step 202 and the character string starting from the character to which the pointer P is set is again used as the basic unit dictionary file 26.
To search. Then, as a result of the search of the basic unit dictionary file 26, it is judged whether or not there is an entry in the basic unit (204). If there is an entry, the pointer P is moved by the length of the recognized basic unit. Proceed.

ステップ208において、ポインタＰがセットされてい
る文字から始まる文字列がもう存在しない場合には、基
本単位辞書ファイル26の検索が終了し、複合単位の認識
に成功したことになる。If there is no character string starting from the character to which the pointer P is set in step 208, the search of the basic unit dictionary file 26 is completed, and the composite unit is successfully recognized.

例えば単位認識部24に送られた検索キー文字列が単位
を表すkm/sである場合には、このkm/s自体は複雑な単位
であるため、基本単位辞書ファイル26にエントリが存在
しない。そこで、最初にポインタＰをｋにセットし（20
0）、ｋを基本単位辞書ファイル26により検索してエン
トリの存在を確認する（202）。For example, when the search key character string sent to the unit recognition unit 24 is km / s representing a unit, the km / s itself is a complicated unit, and therefore there is no entry in the basic unit dictionary file 26. Therefore, first set the pointer P to k (20
0) and k are searched by the basic unit dictionary file 26 to confirm the existence of the entry (202).

次に、ポインタＰをｍにセット（206）、ｍを基本単
位辞書ファイル26により検索して（202）、同様にエン
トリの存在を確認する。単位認識部24は、スラッシュ
／、中黒・等を単位の一部とみなすので、次にkm/s中の
／をとばしてポインタＰをｓにセットする（206）。そ
してｓを基本単位辞書ファイル26により検索して同様に
エントリの存在を確認する（202）。これらの結果、
ｋ、ｍ、およびｓのいずれも基本単位辞書ファイル26の
検索によりエントリが存在したので、km/sは単位を表す
文字列であると判断される。このように、検索キー文字
列を構成するすべての文字について基本単位辞書ファイ
ル26にエントリが存在する場合、またはスラッシュ、中
黒等の単位の一部とみなされる記号を除いたすべての文
字について基本単位辞書ファイル26にエントリが存在す
る場合に、その検索キー文字列は単位を表す文字列であ
ると判断される。Next, the pointer P is set to m (206), m is searched by the basic unit dictionary file 26 (202), and the existence of the entry is similarly confirmed. Since the unit recognition unit 24 regards the slash /, the middle black, etc. as a part of the unit, it skips / in km / s and sets the pointer P to s (206). Then, the basic unit dictionary file 26 is searched for s, and the existence of the entry is similarly confirmed (202). These results,
Since entries exist for the k, m, and s by searching the basic unit dictionary file 26, it is determined that km / s is a character string representing a unit. In this way, if there is an entry in the basic unit dictionary file 26 for all the characters that make up the search key string, or for all characters except the symbols that are considered as part of the unit such as slash and center black, When there is an entry in the unit dictionary file 26, the search key character string is determined to be a character string representing the unit.

単位認識部24は、基本単位辞書ファイル26の検索を終
了し、複合単位の認識に成功すると、得られた単位情報
を処理部16に送り、辞書情報保存テーブル16aに格納す
る（210）。これにより単位の認識が終了する。When the unit recognition unit 24 finishes the search of the basic unit dictionary file 26 and succeeds in recognizing the composite unit, the unit recognition unit 24 sends the obtained unit information to the processing unit 16 and stores it in the dictionary information storage table 16a (210). This completes the recognition of the unit.

ステップ204において、ポインタＰがセットされてい
る文字から始まる文字列についての基本単位辞書ファイ
ル26の検索の結果、基本単位辞書ファイル26中にエント
リが存在しない場合には、この文字列を基本単位または
複合単位として認識することができなかったことになる
ので、単位認識部24はこの文字列が辞書未登録語である
という情報、すなわち単位を表すものではないという情
報を処理部16に送り、処理部16の辞書情報保存テーブル
16aに保存することにより（212）、単位の認識が終了す
る。In step 204, if there is no entry in the basic unit dictionary file 26 as a result of searching the basic unit dictionary file 26 for the character string starting from the character to which the pointer P is set, this character string is set as the basic unit or Since it could not be recognized as a composite unit, the unit recognition unit 24 sends to the processing unit 16 information that this character string is a dictionary unregistered word, that is, information that does not represent a unit, and performs processing. Dictionary information storage table of part 16
The recognition of the unit is completed by saving the data in 16a (212).

第５図に戻って、単位の認識（112）が終了すると、
ステップ102に戻り、再び処理部16による辞書引き単位
の切り出しが行われる。Returning to FIG. 5, when the unit recognition (112) is completed,
Returning to step 102, the processing unit 16 again cuts the dictionary lookup unit.

辞書引き単位の切り出しの後、処理部16は切り出した
単位がまだあるか否かを判断し（104）、切り出した単
位、すなわち検索キー文字列がもうない場合には、辞書
情報保存テーブル16aに記憶されている情報を、出力イ
ンターフェース18を通して出力装置30に出力する（11
4）。これにより、入力文の解析が終了する。After cutting out the dictionary lookup unit, the processing unit 16 determines whether or not there is still the cut out unit (104), and if there is no cut out unit, that is, the search key character string, the processing is performed in the dictionary information storage table 16a. The stored information is output to the output device 30 through the output interface 18 (11
Four). This completes the analysis of the input sentence.

以上のように本実施例によれば、英語の入力文を検索
キー文字列に分割して、まず通常の辞書ファイル22によ
り検索し、辞書ファイル22にエントリがない場合に単位
の認識を行う。単位の認識においては、検索キー文字列
を分割してポインタＰにより指示し、分割された文字列
ごとに基本単位辞書ファイル26を検索し、基本単位辞書
ファイル26に記録されているもの、または基本単位辞書
ファイル26に記録されているものの連続からなるもの
を、単位を表す文字列と判断する。As described above, according to the present embodiment, the English input sentence is divided into the search key character strings, the ordinary dictionary file 22 is searched first, and when there is no entry in the dictionary file 22, the unit recognition is performed. In the unit recognition, the search key character string is divided and designated by the pointer P, the basic unit dictionary file 26 is searched for each divided character string, and the basic unit dictionary file 26 or the basic unit dictionary file 26 is recorded. It is determined that a string consisting of a series of items recorded in the unit dictionary file 26 is a character string representing a unit.

したがって、複雑な単位を表す文字列であっても、基
本単位辞書ファイル26に記憶された基本単位を組み合わ
せることにより単位の認識を行うことができるから、多
様な単位表現に対応して解析を行うことができる。しか
も、基本単位辞書ファイル26には基本的な単位のみ、例
えばｋ、ｍ、ｓ等のみを記憶しておけばよく、これらを
組み合わせた複雑な単位、例えばkm、km/s等を記憶して
おく必要がないため、辞書ファイルの容量を少なくする
ことができる。Therefore, even for a character string representing a complicated unit, the unit can be recognized by combining the basic units stored in the basic unit dictionary file 26, and therefore analysis is performed corresponding to various unit expressions. be able to. Moreover, the basic unit dictionary file 26 only needs to store only basic units, for example, k, m, s, etc., and stores complex units combining these, for example, km, km / s, etc. Since there is no need to store it, the size of the dictionary file can be reduced.

効果本発明によれば、入力された文字列について基本辞書
手段を検索し、文字列の一部が検索された場合に、文字
列の他の部分について同様に基本辞書手段を検索し、文
字列を解析する。したがって、基本辞書手段に多くのデ
ータを記憶しておくことなく、多種類の文字列について
解析を行うことができる。Effect According to the present invention, the basic dictionary means is searched for the input character string, and when a part of the character string is searched, the basic dictionary means is similarly searched for the other part of the character string and the character string is searched. Parse the column. Therefore, it is possible to analyze many kinds of character strings without storing a lot of data in the basic dictionary means.

[Brief description of drawings]

第１図は本発明による言語解析装置の一実施例を示すブ
ロック図、第２図は第１図の辞書ファイルに記憶されるデータの一
例を示す図、第３図は第１図の基本単位辞書ファイルに記憶されるデ
ータの一例を示す図、第４図は第１図の辞書情報保存テーブルに記憶されるデ
ータの一例を示す図、第５図は第１図の装置の動作を示すフローチャート、第６図は第５図に示す動作のうち単位の認識を示すフロ
ーチャートである。主要部分の符号の説明 14……入力部 16……処理部 20……辞書検索部 22……辞書ファイル 24……単位認識部 26……基本単位辞書ファイル1 is a block diagram showing an embodiment of a language analysis device according to the present invention, FIG. 2 is a diagram showing an example of data stored in the dictionary file of FIG. 1, and FIG. 3 is a basic unit of FIG. FIG. 4 is a diagram showing an example of data stored in a dictionary file, FIG. 4 is a diagram showing an example of data stored in the dictionary information storage table of FIG. 1, and FIG. 5 is a flowchart showing the operation of the apparatus of FIG. FIG. 6 is a flow chart showing recognition of a unit among the operations shown in FIG. Description of main part code 14 …… Input part 16 …… Processing part 20 …… Dictionary search part 22 …… Dictionary file 24 …… Unit recognition part 26 …… Basic unit dictionary file

Claims

[Claims]

1. An input means for inputting a character string of a predetermined language, a word dictionary means for storing word data, a basic unit dictionary means for storing data representing a unit, and an analysis of the input character string. The analysis unit, the analysis unit searches the word dictionary unit for the input character string, and when the input character string does not exist in the word dictionary unit, searches the basic unit dictionary unit. As a result, when a part of the character string is searched by the basic unit dictionary means, the basic unit dictionary means is similarly searched for the other part of the character string, and the basic unit dictionary means for the other part. If it is not present in the basic unit dictionary means, the slash and the middle black are also considered to be part of the unit. However, the language analysis device is characterized in that a combination of the part of the character string and the other part including the slash and the middle black is determined as a character string representing a unit.