JPS60142734A

JPS60142734A - Character string comparator

Info

Publication number: JPS60142734A
Application number: JP58246703A
Authority: JP
Inventors: Takeshi Shinoki; 剛篠木
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-12-29
Filing date: 1983-12-29
Publication date: 1985-07-27

Abstract

PURPOSE:To attain dissidence of a character string at high speed by detecting the dissidence of the character string through the comparison of hash values. CONSTITUTION:A character string data I functioning as a reference is inputted to input data. The data I is applied with hash processing with the 1st hash processing section 3 and the hash value obtained as the result is set to a section 5-2 of a character string data register 5. On the other hand, a character string data IIcompared with the identity is inputted to the input register 2, the data II is applied with has processing by the 2nd hash processing section 4 and the hash value obtained is set to a section 6-2 of a character string data register 6. The set hash value to sections 5-2, 6-2 is compared by the comparator 7 and when the hash value is different, the comparator 7 outputs 0 and an inverter 8 outputs logical 1. Thus, when the hash value is effective and dissident with each other, a dissident signal is outputted from an AND gate 9.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は文字列比較装置に係り、特に比較すべき文字列
が不一致であることを高速に見出すことにより一致性を
検出するための前処理を高速に行うことができる文字列
比較装置に関する。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention relates to a character string comparison device, and in particular to a preprocessing method for detecting coincidence by quickly finding out whether character strings to be compared are inconsistent. The present invention relates to a character string comparison device that can perform high-speed comparison.

[Technology background]

データ処理装置では数値の処理のみならず、文字に関す
る処理をも行う。例えば２つの文字列が一致するか否か
を検出したり、あるいは文字列の中より特定の文字を抽
出するような処理が行われている。そしてｒＩ　ＡＭ　
Ａ　ＢＯＹ、ｊのｒＡＭＪをｊ”ＷＡＳＪに変えるよう
な処理等も行われる。このような場合にｒＡＭＪを抽出
するため、各文字列を比較することが必要となる。A data processing device not only processes numbers, but also processes characters. For example, processing is performed to detect whether two character strings match or to extract specific characters from a character string. and rI AM
Processing such as changing rAMJ of A BOY, j to j"WASJ is also performed. In order to extract rAMJ in such a case, it is necessary to compare each character string.

ところで文字を処理する場合、１文字を例えば１バイト
のコードで表現しているため、文字列が一致するか否か
を比較するには、各文字列を構成する文字コードを比較
しなければならない。ところが比較すべき文字列が多い
場合には、各文字コード同志の比較により一致性を検出
するとき、全体の処理に非常に長い時間を必要とするの
で高速処理ができない。By the way, when processing characters, each character is represented by a 1-byte code, so to compare whether strings match or not, you must compare the character codes that make up each string. . However, when there are many character strings to be compared, high-speed processing is not possible because the entire process requires a very long time when matching is detected by comparing each character code.

[Conventional technology and problems]

それで、まず比較対象の中から一致しないものを検出し
て文字コード同志の比較対象数を減少させ、比較すべき
候補数の削減をはかることが行われている。Therefore, the number of candidates to be compared is reduced by first detecting those that do not match among the comparison targets and reducing the number of comparison targets for character codes.

従来、このような文字列データ同志の比較の際、不一致
性の検出の高速化のために文字列が有する文字列の長さ
の情報を用いて文字列自体の比較を行う前に長さの比較
を行なってその比較候補を選択していた。しかしこのよ
うな方式では、文字列の長さが等しくかつ文字列自体が
異なる場合が多いため、比較対象をしぼり込むことが不
充分であり、この方式ではなおその処理にかなりの時間
がかかるという欠点があった。Conventionally, when comparing string data like this, in order to speed up the detection of inconsistencies, the length information of the strings is used to calculate the length before comparing the strings themselves. A comparison was made and a comparison candidate was selected. However, with this method, the lengths of the strings are the same and the strings themselves are often different, so it is insufficient to narrow down the comparison targets, and this method still takes a considerable amount of time to process. There were drawbacks.

[Purpose of the invention]

本発明の目的は、このような欠点を改善して文字列の不
一致性をより正確に検出することができる文字列比較装
置を提供することである。SUMMARY OF THE INVENTION An object of the present invention is to provide a character string comparison device that can improve such drawbacks and more accurately detect inconsistencies in character strings.

[Structure of the invention]

この目的を達成するため、本発明の文字列比較装置では
、　文字列データ処理を行うデータ処理装置において、
文字列のハツシュ値を出力するハツシュ処理部と、ハツ
シュ値が有効か否かを示す有効指示データ保持手段と、
複数の文字列のハツシュ値が一致するか否かを検出する
比較手段を設け、各文字列のハツシュ値が有効でかつ各
文字列のハツシュ値が異なることを検出する。In order to achieve this purpose, in the character string comparison device of the present invention, in a data processing device that processes character string data,
a hash processing unit that outputs a hash value of a character string; a validity indication data holding unit that indicates whether or not the hash value is valid;
Comparing means for detecting whether the hash values of a plurality of character strings match is provided, and it is detected that the hash values of each character string are valid and that the hash values of each character string are different.

[Embodiments of the invention]

本発明の一実施例を添付図面にもとづき説明する。図中
、■は第１人力レジスタ、２は第２人力レジスタ、３は
第１ハツシュ処理部、４は第２ハツシュ処理部、５は第
１文字列データ・レジスタ、６は第２文字列データ・レ
ジスタ、７は比較器、８はインバータ、９はアンド・ゲ
ートである。An embodiment of the present invention will be described based on the accompanying drawings. In the figure, ■ is the first manual register, 2 is the second manual register, 3 is the first hash processing section, 4 is the second hash processing section, 5 is the first character string data register, and 6 is the second character string data. - Register, 7 is a comparator, 8 is an inverter, 9 is an AND gate.

第１人力レジスタ１には比較のとき基準となる文字列の
内部表現データ、例えば文字コードの如き文字列データ
が入力されるレジスタであり、第２人力レジスタ２は基
準となる文字列と同一のハツシュ値を有する文字列かど
うかを検出される比較文字列が入力されるレジスタであ
る。第１ハツシュ処理部３は第１人力レジスタ１にセッ
トされた文字列データに対して、例えば各文字コードの
和をめるというようなハツシュ関数にもとづく処理を行
ってその演算結果のハツシュ値を出力するものであり、
第２ハツシュ処理部４は第１バツジ゛ユ処理部３と同様
な処理を行うものである。第１文字列データ・レジスタ
５は基準となる文字列の種々のデータが記入されるレジ
スタであり、その文字列の長さが記入される区分５−１
、第１ハツシュ処理部３で演算されたハツシュ値が記入
される区分５−２、このハツシュ値が有効か、無効かを
示す有効フラグＶｌが記入される区分５−３、文字列自
体の文字列データが記入される区分５−４等を有する。The first human input register 1 is a register into which internal representation data of a character string to be used as a reference at the time of comparison, such as character string data such as a character code, is input. This is a register into which a comparison character string is input to detect whether or not the character string has a hash value. The first hash processing unit 3 performs processing based on a hash function, such as adding up the sum of each character code, on the character string data set in the first manual register 1, and calculates the hash value of the calculation result. It outputs
The second hash processing section 4 performs the same processing as the first batch processing section 3. The first character string data register 5 is a register in which various data of a standard character string is written, and the length of the character string is written in a section 5-1.
, a section 5-2 in which the hash value calculated by the first hash processing unit 3 is entered, a section 5-3 in which a validity flag Vl indicating whether this hash value is valid or invalid is entered, and the characters of the character string itself. It has sections 5-4, etc. in which column data is entered.

第２文字列データ・レジスタ６も上記第１文字列データ
・レジスタ５と同様に区分６−１．６−２．６−３．６
−４、等により構成されている。Similarly to the first character string data register 5, the second character string data register 6 also has a division 6-1.6-2.6-3.6.
-4, etc.

なお有効フラグＶｌ、Ｖ２は次のような作用を有する。Note that the valid flags Vl and V2 have the following effects.

すなわち文字列データが、頻繁に書き替えられる処理が
行われる場合、ハツシュ値の演算が何回も起こりオーバ
ーヘッドになることに対処するために、有効フラグを設
け、すでに計算されたハツシュ値の文字列データが書き
替えられたときそのハツシュ値が無効であることを示す
ものである。これにより処理対象に応じて柔軟に対処す
ることができる。In other words, when processing is performed where string data is frequently rewritten, a valid flag is provided to deal with the overhead of hash value calculations occurring many times. This indicates that the hash value is invalid when the data is rewritten. This allows for flexible handling depending on the processing target.

比較器７は区分５−２および区分６−２に記入されたハ
ツシュ値を比較するもので、ハツシュ値が同一のとき「
１」、異なるときｒＯＪを出力する。Comparator 7 compares the hash values entered in section 5-2 and section 6-2, and when the hash values are the same, "
1”, outputs rOJ when different.

次に本発明の動作について説明する。Next, the operation of the present invention will be explained.

まず、第１人力レジスタ１に基準になる文字列の文字列
データが入力される。この文字列データは第１文字列デ
ータ・レジスタ５の区分５−４にセントされるとともに
第１ハツシュ処理部３によりハツシュ処理され、その演
算結果得られたハツシュ値が第１文字列データ・レジス
タ５の区分５−２にセットされる。なお上記文字列デー
タは図示省略した長さ検出手段により文字列データの長
さがめられ、この長さが区分５−１にセットされる。First, character string data of a reference character string is input into the first manual register 1 . This character string data is sent to section 5-4 of the first character string data register 5 and hashed by the first hash processing section 3, and the hash value obtained as a result of the operation is stored in the first character string data register. 5, section 5-2. The length of the character string data is determined by a length detecting means (not shown), and this length is set in the section 5-1.

第２人力レジスタ２には基準になる文字列と同一性が比
較される文字列の文字列データが入力され、この文字列
データが第２ハツシュ処理部４にてハツシュ処理されて
得られたハツシュ値が第２文字列データ・レジスタ６の
区分６−２にセットされ、また文字列データ自身も区分
６−４にセットされる。この比較される文字列の長さの
データもこれまた同様に区分６−１にセットされる。そ
れから区分５−２．６−２にセットされたハツシュ値が
比較器７にて比較され、これらのハツシュ値が異なると
き比較器７は「０」を出力し、インバータ８は「１」を
出力する。したがってこれらのハツシュ値が有効のとき
で不一致のとき、アンド・ゲート９よりハツシュ値の不
一致を示す不一致信号が出力されるので、上記第２文字
列データ・レジスタ６にセットされた文字列は第１文字
列データ・レジスタ５にセットされた文字列とは異なる
ことが簡単にわかる。The character string data of the character string to be compared for identity with the reference character string is inputted to the second human input register 2, and the character string data is hash-processed in the second hash processing section 4 to obtain a hash. The value is set in section 6-2 of second string data register 6, and the string data itself is also set in section 6-4. Data regarding the length of the character strings to be compared is also similarly set in section 6-1. Then, the hash values set in section 5-2.6-2 are compared in comparator 7, and when these hash values are different, comparator 7 outputs "0" and inverter 8 outputs "1". do. Therefore, when these hash values are valid and do not match, the AND gate 9 outputs a mismatch signal indicating the mismatch of the hash values, so that the character string set in the second character string data register 6 is It is easy to see that this character string is different from the character string set in the 1 character string data register 5.

もしこれらのハツシュ値が一致すれば比較器７は「１」
を出力し、インバータ８はｒＯＪを出力するのでアンド
・ゲート９から不一致信号が出力されない。したがって
このとき各有効フラグｖ１、■２が有効状態を示す「１
」であれば、第２文字列データ・レジスタ６にセットさ
れている文字列データは第１文字列データ・レジスタ５
にセットされている文字列データと一致する可能性があ
ることになる。If these hash values match, the comparator 7 will be "1"
Since the inverter 8 outputs rOJ, the AND gate 9 does not output a mismatch signal. Therefore, at this time, each valid flag v1, ■2 is "1" indicating the valid state.
”, the string data set in the second string data register 6 is transferred to the first string data register 5.
This means that there is a possibility that it will match the string data set in .

このようにして文字列の不一致性の検出を精度よく高速
化することができるので、多数の比較すべき文字列があ
る場合、文字コードによりその同一性を正確に比較すべ
き対象数を大幅に、高速に削減することができその結果
として一致性の判断を高速に行うことができる。In this way, it is possible to speed up the detection of inconsistency between strings with high accuracy, so when there are many strings to be compared, the number of strings that need to be accurately compared for identity can be greatly reduced by character codes. , can be reduced quickly, and as a result, consistency can be determined quickly.

なお上記説明ではハツシュ関数として各文字コードの和
をめる例について説明したが、ハツシュ関数は勿論これ
のみに限定されるものではない。In the above description, an example has been described in which the sum of each character code is calculated as a hash function, but the hash function is of course not limited to this.

そして入力レジスタ、ハツシュ処理部をそれぞれ２個設
けた例について説明したが勿論これらを１個ずつにする
こともできる。Although an example has been described in which two input registers and two hash processing sections are provided, it is of course possible to provide one each.

〔Effect of the invention〕

本発明によれば、文字列の不一致性の検出を、それらの
ハツシュ値の比較を行うことにより高速に行うことがで
きるのみならず、また文字列データが頻繁に書き替えら
れる処理の場合に有効フラグを使用することによりハツ
シュ値の再針算を抑止することができるので、処理対象
に応じて柔軟に対処することができる。According to the present invention, inconsistency in character strings can not only be detected quickly by comparing their hash values, but also effective in cases where character string data is frequently rewritten. By using the flag, recalculation of the hash value can be suppressed, so it can be handled flexibly depending on the processing target.

[Brief explanation of the drawing]

図は本発明の一実施例構成図である。図中、１は第１人力レジスタ、２は第２人力レジスタ、
３は第１ハツシュ処理部、４は第２ハツシュ処理部、５
は第１文字列データ・レジスタ、６は第２文字列データ
・レジスタ、７は比較器、８はインバータ、９はアンド
・ゲートを示す。特許出願人　富士通株式会社代理人弁理士　山　谷　晧　榮千−銖係号The figure is a configuration diagram of an embodiment of the present invention. In the figure, 1 is the first manual register, 2 is the second manual register,
3 is a first hash processing section, 4 is a second hash processing section, 5
is a first string data register, 6 is a second string data register, 7 is a comparator, 8 is an inverter, and 9 is an AND gate. Patent Applicant: Fujitsu Ltd. Representative Patent Attorney Akira Yamatani

Claims

[Claims]

In a data processing device that processes character string data, a hash processing unit that outputs hash values of character strings, a validity indication data holding means that indicates whether or not the hash values are valid, and hash values of a plurality of character strings match. A character string comparison device includes a comparison means for detecting whether the hash values of each character string are valid and that the hash values of each character string are different.