JPS60225273A

JPS60225273A - Word retrieving system

Info

Publication number: JPS60225273A
Application number: JP59081165A
Authority: JP
Inventors: Shunichi Fukushima; 俊一福島
Original assignee: Agency of Industrial Science and Technology
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 1984-04-24
Filing date: 1984-04-24
Publication date: 1985-11-09
Also published as: JPH0233185B2

Abstract

PURPOSE:To realize more flexible word retrieval and more efficient word retrieval by collating a part preceding a key character of a given character string and a part succeeding this key character with a word dictionary. CONSTITUTION:A character string reading means 31 is started, and a character string to be subjected to word retrieval out of character strings is written on a character string storage means 32. A degree of reliability of character recognition for each character is sent to a key character setting means 33 to determine the key character. A key character retrieving means 34 is started to retrieve words having the key character in key character parts 211. A succeeding expression part collating means 36 is started to collate a partial character string succeeding the key character with rear expression parts 213 of the word dictionary. A preceding expression part collating means 35 is started to collate a partial character string preceding the key character with preceding expression parts 212 of individual words successively, and the word coinciding the input character string is sent back to a retrieval control means 38.

Description

【発明の詳細な説明】本発明は、与えられた文字列の部分文字列と表記の一致
する単語を単語辞書から検索する単語検索方式に関する
ものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a word search method for searching a word dictionary for a word whose notation matches a substring of a given character string.

日本語文の文字列は、英文等のような単語単位の分かち
書きの習慣を持たず、通常べた書きで表記される。その
ため、日本語文を電子計算機等を用いて機械的に解析す
る際にはまず、与えられた文字列の部分文字列と表記の
一致する単語を単語辞書から検索することによって、同
与えられた文存列から単語を検索することが必要となる
。Character strings in Japanese sentences are usually written in solid letters, unlike English sentences, which do not have the custom of separating each word. Therefore, when analyzing a Japanese sentence mechanically using an electronic computer, etc., the first step is to search a word dictionary for words that match the partial string of a given string. It is necessary to search for words from the list.

第１図は上記のような単語検索に用いられる単語辞書の
例を示した１士あり、単語に対応する各レコードは表記
部１１と単語情報部１２とを有し、同表記部１１の文字
コード列の順に収容されている。FIG. 1 shows an example of a word dictionary used for word searches as described above. Each record corresponding to a word has a notation part 11 and a word information part 12, and the characters in the notation part 11 are They are stored in the order of code strings.

ただし、単語情報部１２は、単語検索後に文法的解析を
行う際に参照される品詞情報や音声出力する場合に用い
られる読み・アクセント情報等が収められたものであシ
、轟単語検紫において必要１とされる情報ではない。さ
て、従来の単語検索刃ドは、このような単語辞書を用い
、与えられた文−列に対応する文字コード列と単語辞書
の表記部の文字コード列とを文字コード列の先頭から比
較照合することにより、同与えられた文字列の先頭から
表記の一致する単語を検出するものである。However, the word information section 12 stores part-of-speech information that is referred to when performing grammatical analysis after a word search, and pronunciation/accent information that is used when outputting audio. This is not information that is considered necessary. Now, conventional word search blades use such a word dictionary and compare and match the character code string corresponding to a given sentence string with the character code string in the notation part of the word dictionary from the beginning of the character code string. By doing this, it detects words with matching notation from the beginning of the given character string.

例えば、文字列「雨季になると」に対して「雨季」「雨
」が検出される。この単語検索方式を検出された単語の
直後の文字列に対して繰シ返し適用することによって与
えられた文字列を単語単位に分割することが可能である
が、この単語検索方式は前述のように先頭から表記の一
致する単語を検出するものであるため、与えられた文字
列が例えば文字認識結果のような複数の文字候補を有す
る文字列であったシ、単語辞書に未収容の単語が出現し
たシ、解析結果が複数過多存在する等によって、検索す
べき文字列の先頭位置が曖昧になる場合には効率的な適
用ができないという欠点を有する。For example, "rainy season" and "rain" are detected for the character string "when it comes to rainy season." By repeatedly applying this word search method to the string immediately after the detected word, it is possible to divide a given string into word units, but this word search method is Since it detects words that match the notation from the beginning, if the given character string is a string with multiple character candidates, such as the result of character recognition, there may be words that are not included in the word dictionary. This method has the disadvantage that it cannot be applied efficiently when the starting position of the character string to be searched becomes ambiguous due to the presence of too many characters or analysis results.

本発明の目的は、このような従来の欠点を除去し、与え
られた文字列と単語辞書の表記部との照合を文字列の先
頭位置からに限定しない柔軟性のある単語検索方式を提
供することにある。An object of the present invention is to eliminate such conventional drawbacks and provide a flexible word search method that does not limit matching of a given character string with the notation of a word dictionary from the beginning of the character string. There is a particular thing.

即ち、本発明は、与えられた文字列の部分文字列と表記
の一致する単語を単語辞書から検索する連語検索方式に
して、同単語辞書における表記部ｊＪＫ対して単語検索
におけるキーとなるキー文字部ｌと同キー文字部よシ前
方の表記部分である前方表記部と同キー文字部よシ後方
の表記部分である後方表記部とを設け、前記与えられた
文字列中に前記キー文字部との照合の対象となるキー文
字を設定するキー文字設定手段と、前記４見られた文字
列中に設定された前記キー文字を前記キー文字部と照合
することによシ前記単語辞書から検索する“−文字検索
手段と・前記与えられ７文字列に１　１ける前記キー文
字よシ前方の部分文字列と前記前方表記部とを照合する
前方表記部層合手段と、前記与えられた文字列における
前記キー文字よシ後方の部分文字列と前記後方表記部と
を照合する後方表記部照合手段とを用いて、前記与えら
れた文字列と前記単語辞書における表記部との照合を前
記キー文字から前方および後方に行うことを特徴とする
単語検索方式である＠以下に図面を用いて本発明を具体的に説明する。That is, the present invention uses a collocation search method to search a word dictionary for a word whose notation matches a substring of a given character string, and searches for a key character that is a key in the word search for the notation part jJK in the same word dictionary. A front notation part that is a notation part in front of the same key character part and a back notation part which is a notation part after the same key character part are provided, and the key character part in the given character string is provided. key character setting means for setting a key character to be compared with the character string; and searching from the word dictionary by comparing the key character set in the four seen character strings with the key character part. a "-character search means for searching the given 7 character strings; and a forward notation part layering means for matching the forward notation part with a partial character string in front of the key character by 1 in the given 7 character string; The given character string is collated with the notation part in the word dictionary using the backward notation part collation means that matches the partial string after the key character in the string with the back notation part. This is a word search method characterized by searching forward and backward from a character.The present invention will be specifically described below with reference to the drawings.

第２図は本発明の単語検索方式において用いられる単語
ｆ＃Ｖ誉の例を示した図であシ、単語に対応する各レコ
ードは表記部２１と読み・品詞・アクセント等の単語情
報部２２とを有し、さらに同表記部２１に対して単語検
索におけるキーとなるキー文１平部２１１と同キー文字
部よシ前方の表記部分であ１μ前方表記部２１２と同キ
ー文字部よシ後方の表記［部分である後方表記部２１３
とを設けたものである。FIG. 2 is a diagram showing an example of the word f#V Homare used in the word search method of the present invention. Each record corresponding to the word is in the notation section 21 and the word information section 22 such as pronunciation, part of speech, accent, etc. In addition, for the same notation part 21, a key sentence 1 plain part 211 which is a key in word search, a notation part 1μ forward from the same key character part, and a 1 μ forward notation part 212 and a character part from the same key character part. Back notation [backward notation part 213 which is part
It has been established that

単語の表記におけるどの文字をキー文字部に登録するか
については本発明は制限を加えていないが、この例にお
いては例えば〈降雨量〉に対して（以　下　余　白：のように、表記のどの文字をキーとしても検索できるよ
うに登録しである。この例とは別に、単語の表記におけ
る末尾の漢字をキー文字部に登録するとか、単語の表記
の先験文字をキー文字部に登録するという方法も考えら
れる（後者は、従来の単語辞書に一致する）。また、こ
の例において、単語に対応する各レコードはまずキー文
字部２１１０文字コードの順に収容され、同キー文字部
の等しいレコードについては後方表記部２１３０文字コ
ード列の順に、さらに前方表記部２１２０文字コード列
の順に並べられている。なお、単語情報部２２は前述し
た従来の単語辞書１と同様に、単語検索後の文法的解析
や音声出力の際に参照されるものであり、当単語検索の
範囲においては用いられない。The present invention does not impose any restrictions on which characters in the notation of a word are registered in the key character part, but in this example, for example, for <rainfall>, Any character can be registered as a key for searching.Apart from this example, you can also register the last kanji of the word notation in the key character part, or register the prior character of the word notation in the key character part. (The latter corresponds to the conventional word dictionary).Also, in this example, each record corresponding to a word is first stored in the order of the key character part 2110 character code, and the equal The records are arranged in the order of the character code string in the backward notation section 2130, and then in the order of the character code string in the forward notation section 2120.The word information section 22 is arranged in the order of the character code string in the backward notation section 2130, and then in the order of the character code string in the forward notation section 2120.In addition, the word information section 22 is arranged in the order of the character code string in the backward notation section 2130.The word information section 22 is arranged in the order of the character code string in the backward notation section 2130. It is referred to during grammatical analysis and audio output, and is not used within the scope of this word search.

第３図は本発明の単語検索方式を実現した装置の一実施
例を示すブロック図である。第３図におパ・ηて、３１
は与えられた文字列３２０を文字列記憶手＋７３２に書
き込む文字列読み込み手段でア）、同与ｉられた文字列
はキーボード、ＯＣＲ、磁気テープ装置等の入力装置を
通して文字コード列に変換された文字列のうち単語検索
の対象となる部分を取シ出したものである。また、文字
列読み込み手段３２は、与えられた文字列中にキー文字
を設定するために必要な情報があれば、その情報（例え
ば、与えられた文字列がＯＣＲから入力されたものであ
るとき、各文字の認識の信頼度等）をキー文字設定手段
３３に送る。３２は与えられた文字列３２０を記憶する
文字列記憶手段であり、例えばＩＣメモリ、磁気ディス
ク装置、磁気テープ等が用いられる。２０は表記部にキ
ー文字部２１１と前方表記部２１２と後方表記部２１３
とを有する第２図に示したような単語辞書２を記憶する
単語辞書記憶手段であり、３２と同様ＩＣメモリ、磁気
ディスク装置。FIG. 3 is a block diagram showing an embodiment of a device implementing the word search method of the present invention. In Fig. 3, 31
is a character string reading means that writes the given character string 320 into the character string memory hand +732 (a), the given character string is converted into a character code string through an input device such as a keyboard, OCR, magnetic tape device, etc. This is the part of the character string that is the target of the word search. In addition, if there is information necessary to set a key character in the given character string, the character string reading means 32 reads that information (for example, when the given character string is input from OCR). , reliability of recognition of each character, etc.) are sent to the key character setting means 33. 32 is a character string storage means for storing the given character string 320, and for example, an IC memory, a magnetic disk device, a magnetic tape, etc. are used. 20 has a key character part 211, a front notation part 212, and a rear notation part 213 in the notation part.
It is a word dictionary storage means for storing the word dictionary 2 as shown in FIG. 2, and has an IC memory and a magnetic disk device like 32.

磁気テープ装置等が用いられる。３３は与えられた文字
列３２０中に単語辞書２のキー文字部２１１との照合の
対象となるキー文字３２１を設定するキー文字設定手段
であシ、例えば与えられた文字列における字椎の変化点
、あるいは与えられた文字列がＯＣＲから入力されたも
のであるならば文字認識の信頼度の高い文字等がキー文
字として設定・市れる。ただし、本発明では、キー文字
設定のた’−ｘの条件については制限を加えてはいない
。また、斤めキー文字の設定された文字列を読み込むな
らば、文字列読み込み手段３１とキー文字設定手段３３
は同一の構成要素として実現することができる。A magnetic tape device or the like is used. Reference numeral 33 denotes a key character setting means for setting a key character 321 to be compared with the key character part 211 of the word dictionary 2 in a given character string 320, for example, a change in character in a given character string. If a point or a given character string is input from OCR, characters with high reliability of character recognition are set and marketed as key characters. However, in the present invention, no restrictions are placed on the conditions of '-x' for setting key characters. In addition, if a character string in which a key character is set is to be read, the character string reading means 31 and the key character setting means 33 are used.
can be realized as the same component.

３４はキー文字３２１をキー文字部２１１と照合するこ
とによシ単語辞書２から検索するキー文字検索手段、３
５は与えられた文字列３２０におけるキー文字よシ前方
の部分文字列３２２と単語辞書の前方表記部２１２とを
照合する前方表記部照合手段、３６は与えられた文字列
３２０におけるキー文字よ如後方の部分文字列３２３と
単語辞書の後方表記部２１３とを照合する後方表記部照
合手段である。キー文字検索手段３４におけるキー文字
の検索方法、および前方表記部照合手段３５と後方表記
部照合手段３６とにおける文字列の照合方法は、従来の
キー文字の検索方法および文字列の照合方法と同様であ
夛公知のものである。３７は与えられた文字−列３２０
の部分文字列と表記の一致する単語が検出されたとき同
単語に関する単語情報を出力するための情報出力手段で
ある。以上の各構成要素の動作は検索制御手段３８によ
って制御され、ｇ４図のフローチャートに示す一連の動
作を行う。ただし、第４図のフローチャートにおいて５
ＴＥＰ４と５ＴＥＰ５の順は、単語辞書２における単語
の並び方によるものであり、この実施例においては前述
のように同一のキー文字部を有する単語はまず後方表記
部の文字コード列の順に次いで前方表記部の文字コード
列の順に並べられているため、まず後方表記部を照合し
く　５ＴＥＰ４）、次いで前方表記部を照合する（　５
ＴＥＰ５　）のが適している。これに対して単語辞書に
おいて同一のキー文字部を有する単語ｔ−まず前方表記
部の文字コード列の順に次いで後方表記部の文字コード
列の順に並べるならば、５ＴＥＰ４と５ＴＥＰ５の順は
入れ換えた方が適切である。34 is a key character search means for searching the word dictionary 2 by comparing the key character 321 with the key character section 211;
5 is a forward notation part collation means for comparing a partial character string 322 in front of a key character in a given character string 320 with the forward notation part 212 of the word dictionary; This is a backward notation part collation means that collates the backward partial character string 323 and the backward notation part 213 of the word dictionary. The key character searching method in the key character searching means 34 and the character string matching method in the forward notation part matching part 35 and the backward notation part matching part 36 are the same as the conventional key character searching method and character string matching method. This is a well-known method. 37 is the given character string 320
This is an information output means for outputting word information regarding the word when a word whose notation matches the partial character string of is detected. The operations of each of the above constituent elements are controlled by the search control means 38, and a series of operations shown in the flowchart in Figure g4 are performed. However, in the flowchart of Figure 4,
The order of TEP4 and TEP5 depends on how the words are arranged in the word dictionary 2. In this embodiment, as mentioned above, words having the same key character part are first arranged in the order of the character code string of the backward notation, then the forward notation. Since the parts are arranged in the order of the character code string, first check the rear notation (5TEP4), then check the front notation (5TEP4).
TEP5) is suitable. On the other hand, if words t that have the same key character part in a word dictionary are arranged first in the order of the character code string of the front notation, then in the order of the character code string of the rear notation, the order of 5TEP4 and 5TEP5 is interchanged. is appropriate.

ここで、例えばＯＣＲから入力された文字列「降雨量測
定」に対して単語検索を行う場合について第４図のフロ
ーチャートの５ＴＥＰに沿って説明する。Here, for example, a case where a word search is performed for the character string "rainfall measurement" inputted from OCR will be explained along with 5TEP of the flowchart of FIG. 4.

（ＳＴＥＰＩ：第４図中１００）　文字列読み込み手段
３１が起動され、ＯＣＲから入力された文字列のうち単
語検索の対象と々る文字列「降雨量測定」が文字列記憶
手段３２に書き込まれる。また、キー文字を設定するた
めの情報として、「降雨量測定」の各文字に関する文字
認識の信頼度がキー文字設定手段３３に送られる。例え
ば、各文字の信頼度が次のように与えられたとして、以
下の５ＴＥＰを説明する。(STEPI: 100 in Figure 4) The character string reading means 31 is activated, and the character string "rainfall measurement" that is the target of the word search among the character strings input from the OCR is written into the character string storage means 32. . Further, as information for setting key characters, the reliability of character recognition for each character of "rainfall measurement" is sent to the key character setting means 33. For example, the following 5TEP will be explained assuming that the reliability of each character is given as follows.

降０．９５　測０．９６雨　０．９８　定　０．９４量０．９６１、’、ｄｓＴＥｐ２：第４図中１０１）　キー文字設
定手段１３が起動され、「降雨量測定」においてキー文
字、＃２１が設定される。ここでは、文字認識の信頼度
の最も高い「雨」がキー文字として設定されたものとす
る。Rainfall 0.95 Measurement 0.96 Rain 0.98 Constant 0.94 Amount 0.96 1,', dsTEp2: 101 in Fig. 4) The key character setting means 13 is activated, and the key character, #21 is set. Here, it is assumed that "rain", which has the highest reliability of character recognition, is set as the key character.

（ＳＴＥＰ３：第４図中１０２）　キー文字検案手段３
４が起動され、キー文字「雨」をキー文字部２１１に有
する単語が検案される。第２図の単語辞書２においては
、単語く雨〉、＜にわか雨〉、・・・・・・。(STEP 3: 102 in Figure 4) Key character verification means 3
4 is activated, and words having the key character "rain" in the key character section 211 are examined. In the word dictionary 2 in Figure 2, the words kuuame>, <shower>, etc.

〈雨量〉、＜降雨量〉、・・・・・・が検索され、「絢
」をキー文字部２１１に有する単語群の単語辞書におけ
る先頭位置と末尾位置とが、検索制御手段３８へ返され
る。この場合、もし「雨」金キー文字部２１１に有する
単語が単語辞書２中に存在しなかったならば、検索失敗
が検索制御手段３８に返され、処理が終了する。<Rainfall amount>, <Rainfall amount>, . . . are searched, and the start and end positions in the word dictionary of the word group having “Aya” in the key character section 211 are returned to the search control means 38. . In this case, if the word "rain" in the gold key character section 211 does not exist in the word dictionary 2, a search failure is returned to the search control means 38 and the process ends.

（ＳＴＥＰ４：第４図中１０３）　後方表記部照合手段
３６が起動され、「降雨量測定」におけるキー文字「雨
Ｊ　（３２１）よシ後方の文字列「量測定」（３２３）
と単語辞書の後方表記部２１３との照合が行なわれる。(STEP 4: 103 in Fig. 4) The rear notation section collation means 36 is activated, and the key character "Rain J (321)" in "Rainfall amount measurement" is replaced with the character string "Amount measurement" (323).
is compared with the backward notation section 213 of the word dictionary.

この照合は、５ＴＥＰ３において得られた「雨」をキー
文字部に有する単語群の先頭位置と末尾位置との間の単
語に関して行なわれ、「量測定」を後方表記部２１３に
有する単語、「量７４ｊＪを後方表記部２１３に有する
単語、「量」を後事表記部２１３に有する単語、および
後方表記部を声さない単語（後方表記部２１３がスペー
スの単語）がそれぞれ捜され、単語辞書２におけるそれ
らの位置が検索制御手段３８に返される。すなわち、「
雨」が末尾文字であろく雨〉、＜にわか雨〉。This verification is performed on the words between the beginning and end positions of the word group that has "rain" in the key character part obtained in 5TEP3, and the words that have "quantity measurement" in the rear notation part 213, "quantity measurement" in the rear notation part 213, Words with 74jJ in the rear notation part 213, words with "quantity" in the latter part 213, and words whose rear part is not voiced (words with a space in the rear notation part 213) are searched, and the word dictionary 2 their positions in are returned to the search control means 38. In other words, “
``Rain'' is the last letter of the word ``Rokuame'', ``Shower''.

〈小雨〉等の単語およびく雨量〉く降雨量〉の単語辞書
２内での位置が検索制御手段３８に返される。The positions of words such as ``light rain'' and ``rain amount'' in the word dictionary 2 are returned to the search control means 38 .

（ＳＴＥＰ５：第４図中１０４）　前方表記部照合手段
３５が起動され、８ＴＥＰ４において得られ丸缶単語に
ついて順次前方表記部２１２と「降雨量測定」における
キー文字１−雨Ｊ　（３２１）よυ前方の文字列「降Ｊ
　（３２２）との夢合が行なわれ、一致した　）単語の
位置が検索制御手段３８に返される。すなわら、単語辞
書２におけろく降雨量〉の位置が検索制御手段３８に返
される。また、この際。(STEP 5: 104 in FIG. 4) The front notation part collation means 35 is activated, and the front notation part 212 and the key character 1 in "rainfall measurement" - rain J (321) υ are sequentially activated for the round can word obtained in 8TEP4. The character string in front is “Fuku J”
(322) is performed, and the position of the matched word ) is returned to the search control means 38. In other words, the position of ``rainfall amount'' in the word dictionary 2 is returned to the search control means 38. Also, at this time.

５ＴＥＰ４において得られた単語のうち前方表記部を有
さない単語（前方表記部２１２がスペースの単語）〈雨
〉、＜雨量〉についても単語辞書２における位置が検索
制御手段３８に返される。従って、８ＴＥＰ５において
単語く雨〉、〈雨量〉、＜降雨量〉が検出されたことに
なる。この場合、もし単語が検出されなかったならば、
検出失敗が検索制御手段３８に返され、処理が終了する
。Among the words obtained in 5TEP4, the positions in the word dictionary 2 are also returned to the search control means 38 for words that do not have a front notation (words in which the front notation 212 is a space) <rain> and <rainfall>. Therefore, the words "rain", "rainfall", and "rainfall" were detected at 8TEP5. In this case, if no word is detected,
A detection failure is returned to the search control means 38, and the process ends.

’（ＳＴＥＰ６：第４図中１０５）　情報出力手段３７
が起動され、８ＴＥＰ５において検出された単語く雨〉
、＜雨量〉、＜降雨量〉に関する読み・品詞・アクセン
ト等の単語情報２２が出力される。'(STEP 6: 105 in Figure 4) Information output means 37
was activated and the word kurain was detected in 8TEP5〉
, <rainfall>, and word information 22 such as pronunciation, part of speech, and accent regarding <rainfall> are output.

一般に、単語検案によって得られた単語は、さらに同単
語に関する品詞情報をもとにした機械的な接続検定ある
いはユーザによる選択等によシ正しいものに決定される
ことになる。In general, the words obtained through word testing are further determined to be correct through a mechanical connection test based on part-of-speech information regarding the same word, selection by the user, or the like.

ＯＣＲから入力された文字列は、文字認識候補を複数有
することがあシ、単語検索において日本語として正しい
単語が得られなかった場合には、誤認識と思われる文字
を文字認識の別候補と置き換えて再度単語検索を繰）返
す必要がある。この際、本発明の単語検案方式によれば
、文字列の照合を文字列の先頭からに限定せず任意の位
置から照合することができるので、文字認識の信頼度の
高い文字を固定し前後の暖味な文字を置き換えながら単
語検索を行うことが可能であシ、曖昧な文字列からの単
語検索が効率的に実現できる。Character strings input from OCR may have multiple character recognition candidates, and if the correct Japanese word is not obtained in word search, the character that seems to be misrecognized is considered as another candidate for character recognition. You need to replace it and repeat the word search again. At this time, according to the word verification method of the present invention, character string matching is not limited to the beginning of the character string, but can be performed from any position, so characters with high reliability of character recognition are fixed and It is possible to perform a word search while replacing warm characters, and it is possible to efficiently perform a word search from ambiguous character strings.

上記実施例はＯＣＲから入力された文字列を対象とした
単語検索に関するものであるが、本発明の単語検索方式
は、ＯＣＲから入力された文字列に限らずキーボードや
磁気テープ等から入力された文、°１１＃列に関しても
効率的な単語検索を実現すること、噂できる。例えば、
漢字仮名混じシ文「ざあっと：にわか雨が降シ出した。Although the above embodiment relates to a word search targeting character strings input from OCR, the word search method of the present invention is applicable not only to character strings input from OCR but also to character strings input from a keyboard, magnetic tape, etc. It is rumored that efficient word searches can be realized for sentences and °11# columns. for example,
Sentence mixed with kanji and kana: ``Zaatto: A shower began to fall.

」の解析を行う際、くざあっと〉という単語が単語辞書
に未登録ならば、従来は文頭からの単語検索に失敗した
後、単語検索の先頭位置を１文字ずつずらしながら単語
検索を繰り返し模索しなくてはならないが、本発明の単
語検索方式によれば、字種の変化点である「雨」−琴キ
ーとした単語検索によシく雨〉〈にわか雨〉未検索され
、検索された単語の直、前および直後の文字をキー文字
としてさらに単語検索を繰シ返すことによシ、未登録語
が出現しても効率を落とすこ′となく解析を行うことが
可能である。'', if the word ``kuzaat'' is not registered in the word dictionary, conventionally, after the word search from the beginning of the sentence fails, the word search is repeated while shifting the starting position of the word search one character at a time. However, according to the word search method of the present invention, a word search using the koto key, ``rain'', which is a change point in character types, will search for ``rain'', ``shower'', which is unsearched and searched. By repeating the word search using the characters immediately before, before, and after the registered word as key characters, it is possible to perform analysis without reducing efficiency even if unregistered words appear.

ま九、本発明の単語検索方式で用いる単語辞書は第２図
に示したような構成のみならず、第５図あるいは第６図
等のような構成をとることも可能であ２第５図は、キー文字部をテーブルとして分離した単語辞
書である。５０がキー文字部テーブルであシ、キー文字
部５０１と同キー文字部を有する単語群の先頭位置への
ポインタ５０２とから成っている。Also, the word dictionary used in the word search method of the present invention is not limited to the configuration shown in FIG. 2, but can also be configured as shown in FIG. 5 or 6. is a word dictionary with key character parts separated as tables. 50 is a key character part table, which is composed of a key character part 501 and a pointer 502 to the beginning position of a word group having the same key character part.

単語辞書５は、前方表記部５１と後方表記部５２と単語
情報部５３とを有している。このような単語辞書構成を
用いれば、単語辞書における同一のキー文字部を有する
単語群の位置が５０２のポインタから容易に知ることが
できるので、検索速度がよシ向上する。The word dictionary 5 has a front notation section 51, a rear notation section 52, and a word information section 53. If such a word dictionary structure is used, the position of a group of words having the same key character part in the word dictionary can be easily known from the pointer 502, so that the search speed is greatly improved.

第６図は、キー文字部および後方を木構造にした単語辞
書である。６０がキー文字部テーブルであシ、キー文字
部６０１と後方表記部第１文字目テーブルへのポインタ
６０２と同キー文字を有しかつ一方表記部を有さない単
語へのポインタ６０３と曝ら成る。６１は後方表記部で
あり、後方表記部第１皇文字目テーブル６１１．後方表
記部第２文字目テーブル６１２等を有し、各テーブルは
表記部と次のテーブルへのボイ′ンタと単語へのポイン
タとから成っている。FIG. 6 is a word dictionary in which the key character part and the rear part have a tree structure. 60 is a key character part table, and a key character part 601 and a pointer 602 to the table for the first character of the backward notation part have the same key character, and a pointer 603 to a word without a notation part is exposed. Become. Reference numeral 61 is a rear notation section, and a rear notation section first character table 611. It has a backward notation part second character table 612, etc., and each table is made up of a notation part, a pointer to the next table, and a pointer to a word.

以上に説明したように、本発明の単語検索方式によれば
与えられた文字列と単語辞書との照合を文字列の先頭位
置からに限定することなく、ある文字をキー文字として
同キー文字の前方および後方に行うことが可能であり、
従来の文字列の先頭から照合する単語検索方式に比べて
、より柔軟な単語検索およびよシ効軍の良い単語検索が
実現できる。As explained above, according to the word search method of the present invention, the matching of a given character string with the word dictionary is not limited to starting from the beginning of the character string. Can be performed forward and backward;
Compared to the conventional word search method that searches from the beginning of a character string, more flexible word searches and more effective word searches can be achieved.

[Brief explanation of the drawing]

第１図は従来の単語検索方式で用いられる単語辞書の例
を示した図、第２図、第５図、第６図は本発明の単語検
索方式で用いられる単語辞書の例を示した図、第３図は
本発明の単語検索方式を実現した装置の一実施例を示す
ブロック図、第４図は第３図の実施例を説明するための
フローチャートである。図において、１．２，５．６・・・・・・・・・・・・・単語辞書１
１．２１・・・・・・・・・・・・・・・・・・・・表
記部１２．２２，５３．６３　・・・単語情報部２１１
．５０１．６０１・・・キー文字部２１２．５１．６２
・・・・・・・・・前方表記部２１３．５２．６１・・
・・・・・・・後方表記部３１・・・・・・・・・・・
・・・・・・・・・・・・・・・文字列読み込み手段３
２・・・・・・・・・・・・・・・・・・・・・・・・
・・文字列記憶手段３３・・・・・・・・・・・・・・
・・・・・・・・・・・キー文字設定手段３４・・・・
・・・・・・・・・・・・・・・・・・・・・・キー文
字検索手段３５　・・・・・・・・・・・・・・・・・
・・・・・・・・前方表記部照合手段３６・・・・・・
・・・・・・・・・・・・・・・・・・・後方表記部照
合手段３７・・・・・・・・・・・・・・・・・・・・
・・・・・・・情報出力手段３８・・・・・・・・・・
・・・・・・・・・・・・・・・検索制御手段である。特許出願人１却＃ん表ＩＩＩ田浴節第１図第３図ｉｌｌ？　Ｚｌｌ　ＺＩ５　２２第４図FIG. 1 is a diagram showing an example of a word dictionary used in a conventional word search method, and FIGS. 2, 5, and 6 are diagrams showing examples of a word dictionary used in the word search method of the present invention. , FIG. 3 is a block diagram showing an embodiment of an apparatus implementing the word search method of the present invention, and FIG. 4 is a flowchart for explaining the embodiment of FIG. 3. In the figure, 1.2,5.6・・・・・・・・・・Word dictionary 1
1.21......Notation section 12.22, 53.63...Word information section 211
．． 501.601...Key character section 212.51.62
......Front notation part 213.52.61...
・・・・・・Rear notation part 31・・・・・・・・・・・・
・・・・・・・・・・・・Character string reading means 3
2・・・・・・・・・・・・・・・・・・・・・・・・
...Character string storage means 33...
......Key character setting means 34...
・・・・・・・・・・・・・・・・・・・・・ Key character search means 35 ・・・・・・・・・・・・・・・・・・
......Front notation section collation means 36...
・・・・・・・・・・・・・・・・・・Backward notation part collation means 37・・・・・・・・・・・・・・・・・・・・・
......Information output means 38...
・・・・・・・・・・・・・・・Search control means. Patent Applicant 1 Rejection Table III Field Bath Festival Figure 1 Figure 3 ill? Zll ZI5 22 Figure 4

Claims

[Claims]

A word search method is used to search a word dictionary for a word whose notation matches a substring of a given character string, and the key character part that is the key in word search and the same key character part are used for the notation part in the same word dictionary. A forward notation part, which is a notation part in front of the character string, and a back notation part, which is a notation part after the same key character part, are provided, and the character string is the object of matching with the key character part in the given character string. a key character setting means for setting a key character; a key character search means for searching the word dictionary by comparing the key character set in the given character string with the key character part; forward notation part collation means for matching a partial character string preceding the key character in a given character string with the forward notation part; A word search characterized in that the given character string is compared with the notation part in the word dictionary forward and backward from the key character using a backward notation part collation means for comparing the given character string with the notation part in the word dictionary. method.