JPS62241026A

JPS62241026A - Character string retrieving system

Info

Publication number: JPS62241026A
Application number: JP61083845A
Authority: JP
Inventors: Hisanori Takahashi; 高橋　久則
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-04-11
Filing date: 1986-04-11
Publication date: 1987-10-21

Abstract

PURPOSE:To shorten the executing time of a character string retrieving program by deciding a single character having the minimum using frequency out of a retrieved character string and retrieving and selecting the data to be retrieved with use of said decided character to decrease the comparison frequency between the retrieved character string and the data to be retrieved. CONSTITUTION:A using character analyzing means 1 reads the data 100 to be retrieved and analyzes it to deliver the using character distribution data 200. The emerging frequency of each character of the data 100 is recorded to the data 200. A deciding means 20 of a character string retrieving means 2 decides a single character having the minimum using frequency out of a retrieved character string 300 based on the data 200. Then a retrieving means 21 reads the data 100 and performs the retrieval with the single character decided by the means 20 and to be retrieved first to select the data to be retrieved. A comparison means 22 compares the data to be retrieved that is selected by the means 21 with the character string 300 and delivers the retrieved data 400 when the coincidence is obtained through said comparison.

Description

【発明の詳細な説明】（産業上の利用分野〕本発明は文字列検索方式に関し、特にコンピュータシス
テムにおける文字列検索方式に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a character string search method, and particularly to a character string search method in a computer system.

[Conventional technology]

従来、この種の文字列検索方式では、ファイルに格納さ
れているデータからある文字列データを含むデータを検
索するための方式として、被検索データの先頭から検索
文字列の先頭１文字を順次比較し、同一文字が検出され
た位置から検索文字列全体と条件に合致するが否かを比
較していた。Conventionally, in this type of string search method, the first character of the search string is sequentially compared from the beginning of the searched data as a method to search for data containing certain string data from data stored in a file. Then, the entire search string was compared from the position where the same character was detected to see if it matched the condition.

[Problem that the invention seeks to solve]

上述した従来の文字列検索方式は、文字列検索プログラ
ムの実行時間が実行命令数と実行のときに参照する文字
数に比例するので、検索文字列の先頭１文字が被検索デ
ータに多数ありかつ対象条件に合致する文字列が少ない
場合には、不必要な命令実行および文字参照を行うこと
になり、文字列検索プログラムの実行時間が長くなると
いう欠点がある。In the conventional string search method described above, the execution time of the string search program is proportional to the number of executed instructions and the number of characters referenced during execution. If there are few character strings that match the conditions, unnecessary command execution and character references will be performed, resulting in a disadvantage that the execution time of the character string search program will be longer.

本発明の目的は、上述の点に鑑み、文字列検索プログラ
ムの実行時間を短縮することができるようにした文字列
検索方式を提供することにある。In view of the above-mentioned points, an object of the present invention is to provide a character string search method that can shorten the execution time of a character string search program.

Ｃ問題点を解決するための手段〕本発明の文字列検索方式は、ファイルに格納されている
データを入力し指定文字列を指定条件で調べて条件に合
致したデータを検索する文字列検索方式において、被検
索データを入力して使用文字分布を解析し使用文字分布
データを作成する使用文字分布解析手段と、この使用文
字分布解析手段により作成された前記使用文字分布デー
タにもとづいて前記被検索データ内にある文字列の条件
検索を行う文字列検索手段とを含む。Means for Solving Problem C] The character string search method of the present invention is a character string search method that inputs data stored in a file, searches for specified character strings under specified conditions, and searches for data that matches the conditions. a used character distribution analysis means for inputting searched data and analyzing the used character distribution to create used character distribution data; and a character string search means for performing a conditional search for character strings in the data.

〔Example〕

次に、本発明について図面を参照して説明する。 Next, the present invention will be explained with reference to the drawings.

図は本発明の一実施例を示すブロック図である。The figure is a block diagram showing one embodiment of the present invention.

本実施例の文字列検索方式は、使用文字分布解析手段１
と、文字列検索手段２と、被検索データ１００と、使用
文字分布データ２００と、検索文字列３００と、検索さ
れたデータ４００とから構成されている。The character string search method of this embodiment uses the usage character distribution analysis means 1.
, character string search means 2 , searched data 100 , used character distribution data 200 , search character string 300 , and searched data 400 .

文字列検索手段２は、使用文字分布データ２００に基づ
いて最初に検索すべき１文字を決定する決定手段２０と
、この決定手段２０で決定された１文字で被検索データ
１００を検索する検索手段２１と、この検索手段２１で
検索されたデータと検索文字列３００とを比較して検索
されたデータ４００を出力する比較手段２２とから構成
されている。The character string search means 2 includes a determining means 20 that determines one character to be searched first based on used character distribution data 200, and a retrieval means that searches the searched data 100 using the one character determined by the determining means 20. 21, and a comparison means 22 that compares the data searched by the search means 21 with the search character string 300 and outputs the searched data 400.

次に、このように構成された本実施例の文字列検索方式
の動作について説明する。Next, the operation of the character string search method of this embodiment configured as described above will be explained.

まず、使用文字分布解析手段１は、被検索データ１００
を読み込み、これを解析して使用文字分布データ２００
を出力する。使用文字分布データ２００には、被検索デ
ータ１００における各文字の出現頻度が記録される０本
実施例では、例えば文字Ａが１％、文字Ｂが３％、文字
Ｃが５％、・・・と記録される。First, the usage character distribution analysis means 1 analyzes the search target data 100.
Read and analyze this to obtain usage character distribution data 200
Output. In the used character distribution data 200, the appearance frequency of each character in the searched data 100 is recorded. In this embodiment, for example, the character A is 1%, the character B is 3%, the character C is 5%, etc. is recorded.

決定手段２０は、使用文字分布データ２００に基づいて
検索文字列３００の中で最も使用頻度の低い１文字を決
定する０本実施例では、例えば検索文字列がｒＡＢＣＪ
のときに文字ｒＡＪが決定される。The determining means 20 determines the least frequently used character in the search string 300 based on the used character distribution data 200. In this embodiment, for example, the search string is rABCJ.
The character rAJ is determined when .

検索文字列３００として使用文字分布データ２００にな
い文字が１定されていれば、被検索データ１００を検索
することなしに、検索文字列３００が存在しないことに
なる。If one character that is not in the used character distribution data 200 is specified as the search character string 300, the search character string 300 does not exist without searching the searched data 100.

次に、検索手段２１は、被検索データ１００を読み込み
、決定手段２０によって決定された最初に検索すべき１
文字で検索を行い、検索対象データを選択する。Next, the search means 21 reads the searched data 100 and selects the first search item determined by the determination means 20.
Search by character and select the data to be searched.

比較手段２２は、検索手段２１で選択された検索対象デ
ータと検索文字列３００と比較し、条件に合致したもの
を検索されたデータ４００として出力する。The comparison means 22 compares the search target data selected by the search means 21 with the search character string 300, and outputs the data matching the conditions as searched data 400.

〔Effect of the invention〕

以上説明したように本発明は、被検索データ上に存在す
る文字の分布を解析し、解析結果にもとづいて検索文字
列上の最も使用頻度の低い１文字を決定し、決定された
文字で被検索データを検索し、検索文字列と比較すべき
被検索データを選択し、検索文字列と被検索データの比
較回数を減らすことにより、文字列検索プログラムの実
効時間を短縮することができる効果がある。特に、被検
索データを複数の検索文字列で検索するような場合に有
効である。As explained above, the present invention analyzes the distribution of characters existing in the searched data, determines the least frequently used character in the search string based on the analysis result, and By searching the search data, selecting the searched data to be compared with the search string, and reducing the number of comparisons between the search string and the searched data, the effective time of the string search program can be shortened. be. This is particularly effective when searching for searched data using multiple search strings.

[Brief explanation of drawings]

図は本発明の一実施例を示すブロック図である。図において、１・・・使用文字分布解析手段、２・・・文字列検索手段、２σ・・・決定手段、２１・・・検索手段、２２・・・比較手段、１００　　・・被検索データ、２００　　・・使用文字分布データ、３００　・・検索文字列、４００　　・・検索されたデータである。 The figure is a block diagram showing one embodiment of the present invention. In the figure, 1...Used character distribution analysis means, 2... Character string search means, 2σ...Decision means, 21... search means, 22... Comparison means, 100...searched data, 200...Character distribution data used, 300...Search string, 400...Retrieved data.

Claims

[Claims] In a character string search method in which data stored in a file is input and a specified character string is checked under specified conditions to search for data that matches the conditions, the data to be searched is input and the distribution of characters used is determined. Character usage distribution analysis means for analyzing and creating usage character distribution data, and character strings for performing a conditional search for character strings in the search target data based on the usage character distribution data created by the usage character distribution analysis means. A character string search method comprising: a search means;