JPH03177971A

JPH03177971A - Retrieving method by abbreviation

Info

Publication number: JPH03177971A
Application number: JP1318473A
Authority: JP
Inventors: Takeshi Maruko; 圓子　雄
Original assignee: Fuji Electric Co Ltd; Fuji Facom Corp
Current assignee: Fuji Electric Co Ltd; Fuji Facom Corp
Priority date: 1989-12-07
Filing date: 1989-12-07
Publication date: 1991-08-01

Abstract

PURPOSE:To improve retrieval efficiency by permitting the retrieval of a data base when either targeted character strings with prescribed number of character exceeding the total number of each character unit of abbreviation includes each character unit in the same sequence as that of the character string of the abbreviation. CONSTITUTION:When either the targeted character strings 2 arranged continuously as in decided sequence in the data base 9 to be retrieved and with the prescribed number of characters exceeding the total number of each character unit of the abbreviation includes each character unit in the same sequence as that in the character string of the abbreviation, the data can be permitted to retrieve. Therefore, when no character-encoded data base can be retrieved with certain abbreviation in a conventional system, it is necessary to perform the retrieval based on several normal words or semi-normal words relating to the abbreviation, however, since the retrieval can be directly performed with only the abbreviation, the retrieval efficiency can be improved.

Description

[Detailed description of the invention] [Industrial application field]

この発明は、文字列をなす略語に基づいて文字コード化
されたデータベースを検索する、略語による検索方法に
関する。The present invention relates to an abbreviation search method for searching a character-coded database based on an abbreviation that is a character string.

[Conventional technology]

一般に、文字コード化されたデータベースは、正式語や
準正式語、略語ないし短縮語など、いくつかの形の単語
が混用されて記載される。したがって従来、略語ないし
短縮語（以下、単に略語という）に基づいてデータベー
スを検索するには、次のような方法がとられた。略語と、これに相当する正式語ないし準正式語とを対応
表の形で予め設定しておき、略語に対応する正式語ない
し準正式語が、データベース中に含まれているかどうか
に基づいて、検索可または検索否とした。または、検索
のとき検索否になった時点で略語に対応する正式語ない
し準正式語を順次、入力していく。In general, character-encoded databases are written using a mixture of several forms of words, such as formal words, semi-formal words, abbreviations, and abbreviations. Therefore, conventionally, the following method has been used to search a database based on an abbreviation or abbreviation (hereinafter simply referred to as an abbreviation). Abbreviations and corresponding formal or semi-formal words are set in advance in the form of a correspondence table, and based on whether the formal or semi-formal words corresponding to the abbreviations are included in the database, It was set as searchable or searchable. Alternatively, when a search is rejected, the official or semi-official words corresponding to the abbreviation are input one after another.

[Problem to be solved by the invention]

以上説明したように、従来の技術では、ある略語で検索
できなかったとき、この略語に係るいくつかの正式語、
ないし準正式語に基づいても検索してみる必要があり、
それだけ照合手続きに手間がかかり面倒で検索効率が阻
害された。たとえば、略語が〔××〕のとき、その正式語である（
ｘｘｘｘ株式会社〕、その準正式語である（ＸＸＸＸＫ
、Ｋ）や〔××××■）、（ＸＸＸＸ）など４種類の対
応語がある。したがって、〔Ｘ×〕で検索できなかった
ときには、別に〔×ＸＸＸＸＸＸＸ）や（ＸＸＸＸＫ、
　Ｋ）　、　（ＸＸＸｘ■）、　　（ＸＸＸＸ）に基づ
いても検索してみる必要があり、要するに合計５種類の
単語（文字列）による検索を追加しなければならない。この発明の課題は、従来の技術がもつ以上の問題点を解
消し、文字列をなす略語に基づいて文字コード化された
データベースを検索するときの検索効率向上を図った、
略語による検索方法を提供することにある。As explained above, with conventional technology, when a certain abbreviation cannot be searched, some formal words related to this abbreviation,
It is also necessary to search based on semi-formal words.
The verification process was time-consuming and troublesome, which hindered search efficiency. For example, when the abbreviation is [XX], its official word is (
XXXX Co., Ltd.], its semi-official term (XXXXK
, K), [XXXX■), and (XXXX). Therefore, if you are unable to search for [X×], search for [×XXXXXXXX], (XXXXK,
It is also necessary to search based on K), (XXXx■), and (XXXX), and in short, it is necessary to add searches using a total of five types of words (character strings). The object of this invention is to solve the problems of the conventional technology and improve the search efficiency when searching a character-coded database based on abbreviations that form character strings.
The objective is to provide a search method using abbreviations.

[Means to solve the problem]

この課題を解決するために、本発明に係る略語による検
索方法は、文字列をなす略語に基づいて文字コード化されたデータ
ベースを検索する方法において、前記略語の文字列を各
文字単位に分割し；検索すべき前記データベースにおい
て順序通りに連続して並び前記各文字単位の総数を超え
る所定文字数の対象文字列を選び；この各対象文字列のいずれかが前記各文字単位を前記略
語の文字列におけるのと同じ順序で含むとき前記データ
ベースを検索刃とする。In order to solve this problem, the abbreviation search method according to the present invention is a method of searching a character-coded database based on an abbreviation forming a string, which divides the string of abbreviations into individual characters. ;Select a target character string with a predetermined number of characters that are arranged consecutively in order in the database to be searched and exceeds the total number of each character unit;If any of these target character strings replaces each character unit with the character string of the abbreviation When the database is included in the same order as in the database, the database is used as a search blade.

[Working space] When any of the target character strings that are arranged consecutively in order in the database to be searched and have a predetermined number of characters exceeding the total number of each character unit contain each character unit in the same order as in the abbreviation character string, Use that database as a search blade. 【Example】

本発明に係る、略語による検索方法の適用例ｔこついて
、それに基づく概念図である第１図を参照しながら説明
する。第１図において、略語ｌである（ＸＸＪを、それを構成
する第１の単位文字ＩＡ　　（Ｘ）と、第２の単位文字
ＩＢ　　（Ｘ）とに分割する。一方、検索すべきデータ
ベース９である「・・・ための出張にＸＸＸＸを利用し
・・・」を、１文字ずつ順にずらして連続して文字数５
ずつで区切った文字列を作り、この各文字列を対象文字
列（検索するための対象文字列という意味）と呼ぶこと
にして、それぞれに共通な符号２を付ける。なお、対象
文字列２の文字数５は、主に略語１に係る正式語ないし
準正式語の文字数を参考にしながら、略語】の単位文字
の総数２を超えて適宜の数に決められる。第１図において、対象文字列２には、■破線下線表示の
ように各単位文字（Ｘ）、（Ｘ）のいずれをも含まない
もの、■−点鎖線下線表示のように各単位文字（Ｘ）、
（Ｘ）のいずれかを含むもの、■実線下線表示のように
各単位文字〔×〕。〔×〕のいずれをも含むもの□の３種類あり、■のよう
な対象文字列があるときにだけ、データベース９は検索
刃とし、■、■のような対象文字列だけのときには、デ
ータベース９は検索ＮＧとする。すなわち、いずれかの
対象文字列の中に略語の各文字単位ＣＸ）、（Ｘ）が、
略語の文字列におけるのと同じ順序で含まれるときにだ
け、検索刃とするわけである。データベース９の各対象文字列について、■の各単位文
字（Ｘ）、（Ｘ）のいずれをも含むものかどうかを探索
していく動作には、二つの方式がある。その第１は、デ
ータベースの先頭文字から順に５文字の連続した対象文
字列を選び、その中に各単位文字（Ｘ）、（Ｘ）がその
順序で含まれているとき、検索刃とする方式である。そ
の第２は、データベース中の単位文字〔×〕を順に探索
していき、あったときこれを起点として５文字の連続し
た対象文字列を選び、その中に次の単位文字〔×〕が含
まれているとき、検索刃とする方式第１の方式に係る検
索動作について、それを示す第２図のフローチャートと
、第１図の概念図とを参照しながら説明する。第２図において、ステップＳ１でカウンタｉの初期化、
ｉ々１５ステップＳ２でカウンタｊの初期化、ｊ＝１、
をそれぞれおこなう。なお、ｉはデータベースの各文字
のその先頭からの順番数字を表し、ｊは５文字の対象文
字列の中の各文字のその先頭からの順番数字を表す。ステップＳ３で、対象文字列に属する文字Ｌｉが、略語
ｔ　（ＸＸ）の第１の文字単位（Ｘ）Ｘｉであるかどう
かが判断され、ＹＥＳならステップＳ４に、ＮＯならス
テップＳ６に移る。ステップＳ４では、対象文字列の文字Ｌｉ＋ｊが略語１
　（ＸＸ）の第２の文字単位（Ｘ）Ｘ２であるかどうか
が判断され、ＹＥＳならステップＳ５に、Ｎｏならステ
ップＳ７に移る。ステップＳ５では、「検索可（ＯＫ）
Ｊと判定されて終了する。ステップＳ６で、対象文字列の文字Ｌ　ｉ＋ｊが、略語
１　（ＸＸ）の第１の文字単位（Ｘ：ｌ　ＸＩであるか
どうかが判断され、ＹＥＳならステップＳ７に、Ｎｏな
らステップＳ９に移る。ステップＳ７で、ｊがインクリメントされ、ステップＳ
８で、ｊが４を超えるかどうかが判断されて、超えれば
ステップＳｌｌに、以下ならステップＳ４に戻る。なお
、ステップＳ７，３８は、対象文字列の中に文字単位（
Ｘ）Ｘ２が含まれるかどうかを順次調べる手順である。ステップＳ９でも、ｊがインクリメントされ、ステップ
ＳＩＯで、ｊが３を超えるかどうかが判断されて、超え
ればステップＳｌｌに、以下ならステップＳ６に戻る。なお、ステップＳ９．ＳＩＯは、対象文字列の中に文字
単位（Ｘ）ＸＩが含まれるかどうかを順次調べる手順で
ある。ステップＳｌｌ、　　Ｓ１２は、以上の手順をデータベ
ースの個数ｎ全での文字について繰り返す手順である。ステップＳ１２で、ＮｏならステップＳ２に戻り、次の
対象文字列について同様の手順がとられる。もしＹＥＳ
ならステップＳ１３で、「検索否（ＮＧ）Ｊと判定され
て終了になる。第２の方式に係る検索動作について、それを示す第３図
のフローチャートと、第１図の概念図とを参照しながら
説明する。第３図において、ステップＳ１でカウンタｉの初期化、
ｉ＝１、がなされ、ステップＳ２で、対象文字列に属す
る文字Ｌｉが、略語１　〔××〕の第１の文字単位ＥＸ
）ＸＩであるかどうかが判断され、ＹＥＳならステップ
Ｓ３に、ＮＯならステップＳ６に移る。なお、カウンタ
ｉは第２図におけるのと同し意味である。ステップＳ３でカウンタｊの初期化、ｊ＝１、がなされ
る。なお、カウンタｊは第２図におけるのと同し意味で
ある。ステップＳ４では、対象文字列の文字Ｌ　ｉ＋ｊが略語
１　（ＸＸ）の第２の文字単位〔航）Ｘ２であるかどう
かが判断され、ＹＥＳならステップＳ５に、ＮＯならス
テップＳ９に移る。ステップＳ５では、「検索可（ＯＫ
）Ｊと判定されて終了する。ステップＳ６．Ｓ７は、ステップＳ２をデータベースの
個数ｎ全ての文字について繰り返す手順である。ステッ
プＳ７で、ＮＯならステップＳ２に戻り、次の対象文字
列について同様の手順がとられる。もしＹＥＳならステ
ップＳ８で、「検索否（ＮＧ）Ｊと判定されて終了にな
る。ステップＳ９では、ｊがインクリメントされ、ステップ
５１０で、ｊが４を超えるかどうかが判断されて、超え
ればステップＳ６に、以下ならステップＳ４に戻る。な
お、ステ７プＳ９．ＳＩＯは、対象文字列の中に文字単
位〔航〕χ２が含まれるかどうかを順次調べる手順であ
る。ところで、第１．第２の各方式を比較すると、そのフロ
ーチャートの総ステツプ数、前者の１３ステンプ、後者
の８ステツプからみても明らかなように、第２方式の方
が若干簡素化されていると言える。その理由は、第１の
方式では、順に各対象文字列を選び、その中に略語の第
１．第２の各文字単位が存在するかどうかを調べる手順
をとっているのに対して、第２の方式では、最初にデー
タベースの各文字の中で略語の第１の文字単位に該当す
るものを探索し、このことによって後の手順を制約した
ことによる。An application example of the search method using abbreviations according to the present invention will be explained with reference to FIG. 1, which is a conceptual diagram based on the application example. In FIG. 1, the abbreviation l (XXJ) is divided into its first unit character IA (X) and its second unit character IB (X). On the other hand, in the database 9 to be searched, ``I used XXXX on a business trip for...'' in 5 consecutive letters, shifting each letter one by one.
Create character strings separated by 1, and call each character string a target character string (meaning a target character string for searching), and assign a common code 2 to each character string. Note that the number of characters 5 of the target character string 2 is determined to be an appropriate number exceeding the total number 2 of unit characters of the abbreviation], mainly with reference to the number of characters of the formal word or semi-formal word related to the abbreviation 1. In Fig. 1, the target character string 2 includes: ■ one that does not contain any of the unit characters (X) or (X), as shown by dashed line underlining, and ■ - each unit character ( X),
Items containing either (X), ■Each unit character [×] as shown in solid underline. There are three types, □, which contain any of [×], and only when there is a target character string such as ■, the database 9 is used as a search blade, and when there are only target character strings such as ■ and ■, the database 9 The search is NG. That is, each character unit of the abbreviation CX), (X) in any target character string is
The search blade only works if the abbreviations appear in the same order as they appear in the string. There are two methods for searching for each target character string in the database 9 to see if it contains any of the unit characters (X) and (X) of ■. The first method is to select a target character string of five consecutive characters in order from the first character in the database, and use it as a search blade when each unit character (X), (X) is included in that order. It is. The second method is to sequentially search for unit characters [×] in the database, and when found, use this as a starting point to select a continuous target character string of 5 characters that contains the next unit character [×]. The search operation according to the first method will be described with reference to the flowchart shown in FIG. 2 and the conceptual diagram shown in FIG. 1. In FIG. 2, in step S1, a counter i is initialized,
i15 In step S2, counter j is initialized, j=1,
Do each. Note that i represents the order number from the beginning of each character in the database, and j represents the order number from the beginning of each character in the target character string of five characters. In step S3, it is determined whether the character Li belonging to the target character string is the first character unit (X)Xi of the abbreviation t (XX). If YES, the process moves to step S4; if NO, the process moves to step S6. In step S4, the characters Li+j of the target character string are abbreviation 1
It is determined whether it is the second character unit (X)X2 of (XX), and if YES, the process moves to step S5, and if No, the process moves to step S7. In step S5, “Searchable (OK)” is selected.
It is judged as J and ends. In step S6, it is determined whether the character L i+j of the target character string is the first character unit (X:l In step S7, j is incremented, and in step S
At step 8, it is determined whether or not j exceeds 4. If it does, the process returns to step Sll, and if it does, the process returns to step S4. Note that in steps S7 and S38, character units (
This is a procedure for sequentially checking whether X) X2 is included. In step S9, j is also incremented, and in step SIO, it is determined whether or not j exceeds 3. If it exceeds, the process returns to step Sll, and if it is below, the process returns to step S6. Note that step S9. SIO is a procedure for sequentially checking whether character units (X)XI are included in a target character string. Steps Sll and S12 are steps in which the above procedure is repeated for all n characters in the database. If No in step S12, the process returns to step S2 and the same procedure is performed for the next target character string. If YES
If so, in step S13, it is determined that the search is not successful (NG) and the process ends. Regarding the search operation according to the second method, please refer to the flowchart in FIG. 3 showing it and the conceptual diagram in FIG. 1. In FIG. 3, in step S1, the counter i is initialized,
i=1, and in step S2, the character Li belonging to the target character string is the first character unit EX of the abbreviation 1 [XX].
) XI, and if YES, the process moves to step S3; if NO, the process moves to step S6. Note that the counter i has the same meaning as in FIG. In step S3, counter j is initialized to j=1. Note that the counter j has the same meaning as in FIG. In step S4, it is determined whether the character L i+j of the target character string is the second character unit [navigation] In step S5, “Searchable (OK)” is selected.
) J is determined and the process ends. Step S6. S7 is a procedure in which step S2 is repeated for all n characters in the database. If NO in step S7, the process returns to step S2, and the same procedure is performed for the next target character string. If YES, in step S8, it is determined that the search is NG, and the process ends. In step S9, j is incremented, and in step 510, it is determined whether j exceeds 4, and if it exceeds the In step S6, if the result is below, the process returns to step S4. Note that step S9.SIO is a procedure for sequentially checking whether or not the character unit χ2 is included in the target character string. Comparing the two methods, it can be said that the second method is slightly simpler, as is clear from the total number of steps in the flowchart, 13 steps for the former and 8 steps for the latter. , the first method takes the steps of selecting each target character string in turn and checking whether each of the first and second character units of the abbreviation exists in it, whereas the second method This is because first, among the characters in the database, a character corresponding to the first character unit of the abbreviation is searched, and the subsequent steps are constrained by this search.

【Effect of the invention】

以上説明したように、この発明においては、検索される
べきデータベースにおいて順序通りに連続して並ぶとと
もに各文字単位の総数を超える所定文字数の各対象文字
列のいずれかが、各文字単位を略語の文字列におけるの
と同し順序で含むとき、そのデータベースを検索可とす
る。したがって、従来、文字コード化されたデータベースを
、ある略語で検索できなかったとき、この略語に係るい
くつかの正式語、ないし準正式語に基づいても検索して
みる必要があったのが、その略語だけで直接、検索可能
になるため、それだけ検索効率の向上を図ることができ
る、というすぐれた効果がある。As explained above, in the present invention, if any of the target character strings that are consecutively arranged in order in the database to be searched and have a predetermined number of characters exceeding the total number of character units, each character unit is an abbreviation. The database is searchable if it is contained in the same order as in the string. Therefore, in the past, when a character-coded database could not be searched for a certain abbreviation, it was necessary to search based on several official or semi-official words related to this abbreviation. Since it is possible to search directly using just the abbreviation, it has the excellent effect of improving search efficiency.

[Brief explanation of drawings]

第１図は本発明方法の適用例に基づく概念図、第２図は
本発明方法の適用例における第１の検索動作を示すフロ
ーチャート、第３図は同じくその適用例における第２の検索動作を示
すフローチャートである。符号説明１；略語、ＩＡ、ＩＢ：文字単位、２：対象文字列、第
３酌FIG. 1 is a conceptual diagram based on an application example of the method of the present invention, FIG. 2 is a flowchart showing the first search operation in the application example of the method of the present invention, and FIG. 3 is a second search operation in the same application example. FIG. Code explanation 1: Abbreviation, IA, IB: Character unit, 2: Target character string, 3rd consideration

Claims

[Claims]

1) In a method of searching a character-coded database based on an abbreviation forming a character string, the character string of the abbreviation is divided into character units; Selecting target character strings with a predetermined number of characters exceeding the total number of character units; making the database searchable when any of the target character strings includes each of the character units in the same order as in the string of the abbreviations; A search method using abbreviations that is characterized by: