JPH11306194A

JPH11306194A - Method for calculating hash value of character string and machine-readable recording medium where program for implementing same method is recorded

Info

Publication number: JPH11306194A
Application number: JP10113243A
Authority: JP
Inventors: Hideki Nishimura; 英樹西村; Kaoru Hieda; 薫稗田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1998-04-23
Filing date: 1998-04-23
Publication date: 1999-11-05
Anticipated expiration: 2018-04-23
Also published as: JP4179660B2

Abstract

PROBLEM TO BE SOLVED: To provide a calculating method for the hash value of a character string which is partial in the appearance frequency of the arrangement of characters. SOLUTION: This method includes a step 252 wherein a machine-readable table for uniquely relating a specific character string to a shorter character string after conversion is prepared, steps 256 and 268 wherein a character string appearing in a character string to be processed is converted to a corresponding character string by referring to the table by using a computer, and a step 260 wherein the hash value is calculated according to the character string to be processed which includes the converted character string by using the computer.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、文字列に対する
ハッシュ値の計算方法に関し、特に、いわゆるインター
ネットのＷＷＷ（ＷｏｒｌｄＷｉｄｅＷｅｂ）上の
ホームページ等のアクセス履歴を、それらのＵＲＬ（Ｕ
ｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）か
ら計算したハッシュ値を用いて定められる記憶領域に保
存する際に、コンピュータを用いて効率的にハッシュ値
を計算するための方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method of calculating a hash value for a character string, and more particularly, to a method of calculating the access history of a homepage or the like on the so-called Internet WWW (World Wide Web), by using the URL (U
The present invention relates to a method for efficiently calculating a hash value by using a computer when storing the hash value in a storage area determined using a hash value calculated from an H.niform Resource Locator.

【０００２】[0002]

【従来の技術】インターネットのＷＷＷ上のホームペー
ジを閲覧したりするためのプログラムとして、ブラウザ
と呼ばれるものが知られている。ブラウザを用いて所望
のホームページを閲覧するためには、基本的にはブラウ
ザに対してそのホームページのＵＲＬを与える。通常は
ブラウザは、与えられたＵＲＬにしたがってインターネ
ット上の所定の資源にアクセスし、当該ブラウザが動作
しているコンピュータのモニタ上にたとえばホームペー
ジを表示する。2. Description of the Related Art As a program for browsing a home page on the Internet WWW, a program called a browser is known. In order to browse a desired homepage using a browser, basically, the URL of the homepage is given to the browser. Usually, a browser accesses a predetermined resource on the Internet according to a given URL, and displays, for example, a homepage on a monitor of a computer on which the browser is operating.

【０００３】ＵＲＬとは、ホームページを管理している
サーバ名と、そのサーバにおけるそのホームページのフ
ァイル名とを、アクセスに使用するプロトコル名と組み
合わせたものである。たとえば「http://www.sharp.co.
jp/sc/zaurus/index.html 」というＵＲＬでは、「htt
p: 」の部分がプロトコル名（ｈｔｔｐ（ｈｙｐｅｒｔ
ｅｘｔｔｒａｎｓｆｅｒｐｒｏｔｏｃｏｌ））を指
定し、「//www.sharp.co.jp 」の部分がサーバ名を表
し、「/sc/zaurus/index.html 」の部分が（ファイルパ
スを含めた）ファイル名を示している。[0003] The URL is a combination of the name of a server that manages a home page and the file name of the home page in the server with the name of a protocol used for access. For example, "http://www.sharp.co.
jp / sc / zaurus / index.html ”,“ htt
p: ”is the protocol name (http (hypert
ext transfer protocol)), “//www.sharp.co.jp” represents the server name, and “/sc/zaurus/index.html” represents the file name (including the file path). Is shown.

【０００４】サーバ名は、通常はそのサーバが提供する
サービスにしたがった名前（ｗｗｗやｆｔｐなど）と、
そのサーバの存在するドメイン名とからなる。ドメイン
名とは、ネットワーク（この例ではインターネット）を
構成する部分ネットワークに与えられた名称（インター
ネット上で一意）である。上の例では「www 」の部分が
ｈｔｔｐプロトコルにしたがったｗｗｗサービスを提供
するサーバであることを示し、「sharp.co.jp 」の部分
がドメイン名を表す。[0004] A server name usually includes a name (such as www or ftp) according to a service provided by the server, and
It consists of the domain name where the server exists. The domain name is a name (unique on the Internet) given to a partial network constituting a network (the Internet in this example). In the above example, “www” indicates a server that provides a www service according to the http protocol, and “sharp.co.jp” indicates a domain name.

【０００５】ところで、インターネットに接続されるコ
ンピュータの数が増加してインターネット上のトラヒッ
クが増加すると、通信速度が低下し、良好なサービスが
提供されなくなるというおそれがある。また、ブラウザ
を使用する個々のユーザからみると、一旦アクセスして
表示されたホームページに対して、それほど間隔を置か
ずに再度アクセスしようとする場合、一度めと同様の時
間をかけて当該ホームページをアクセスしなおすのはレ
スポンスの点から見て問題がある。By the way, when the number of computers connected to the Internet increases and the traffic on the Internet increases, there is a possibility that the communication speed decreases and good services cannot be provided. Also, from the point of view of individual users who use the browser, if they try to access the homepage once accessed and displayed again without a long interval, take the same time as the first time to download the homepage. Reaccessing is problematic in terms of response.

【０００６】そこで、一般的なブラウザは、一度アクセ
スしたホームページについては、当該ブラウザが動作し
ているコンピュータの記憶装置（典型的には固定ディス
ク）にそのファイルをＵＲＬの履歴とともにキャッシュ
ファイルとして保存している。そして、再度同じＵＲＬ
が与えられたときには、キャッシュファイルに当該ＵＲ
Ｌと一致するものがないかどうか調べ、存在する場合に
は遠隔のサーバをアクセスすることなく、キャッシュフ
ァイル中のファイルをアクセスして表示する。キャッシ
ュファイルに当該ＵＲＬと一致するものがないときだけ
インターネット上の当該ＵＲＬをアクセスし表示すると
ともにキャッシュファイルとして保存する。[0006] Therefore, a general browser saves the file of the homepage once accessed as a cache file together with the URL history in a storage device (typically, a fixed disk) of a computer on which the browser operates. ing. And again the same URL
Is given in the cache file,
A check is made to see if there is a match with L. If there is, the file in the cache file is accessed and displayed without accessing a remote server. Only when there is no match with the URL in the cache file, the URL on the Internet is accessed and displayed, and saved as a cache file.

【０００７】キャッシュファイルを持つことにより、イ
ンターネット上のトラヒックの増加は防止され、かつユ
ーザは良好なレスポンスを得ることができる。キャッシ
ュファイルの内容をどのように維持するか、については
種々の方式があるが、その詳細は本願発明とは直接の関
係がないのでここでは詳細には述べない。By having a cache file, an increase in traffic on the Internet is prevented, and a user can obtain a good response. There are various methods for maintaining the contents of the cache file, but the details are not described in detail here because they are not directly related to the present invention.

【０００８】ファイルを固定ディスクに格納する方式と
しては一般的には種々考えられるが、あるＵＲＬを与え
られたときに当該ＵＲＬに対応するファイルがあるかど
うかを高速に検索する必要があることから、キャッシュ
ファイルの履歴を蓄積する方式は自ずと限られる。たと
えば履歴として各ファイルのＵＲＬとその格納アドレス
とを組にして単に履歴リスト中に順に蓄積し、ＵＲＬが
与えられるたびに履歴リストを先頭から調べて、一致す
るＵＲＬのものがあるか否かを見るという方式は、デー
タが増大するとそのために要する平均時間が大きくなる
という問題点がある。そこで、従来から一般的に、ファ
イルのＵＲＬから所定の計算式にしたがって算出される
ハッシュ値を用いて履歴リストを二段階にする方式が採
用されている。There are various methods for storing a file on a fixed disk. Generally, when a certain URL is given, it is necessary to quickly search for a file corresponding to the URL. The method of accumulating the history of the cache file is naturally limited. For example, as a history, a set of the URL of each file and its storage address is simply stored in the history list in order, and each time a URL is given, the history list is checked from the beginning to determine whether there is a matching URL. The viewing method has a problem that as data increases, the average time required for the data increases. Therefore, conventionally, a method is generally employed in which a history list is divided into two stages using a hash value calculated from a URL of a file according to a predetermined calculation formula.

【０００９】この方式では、与えられたＵＲＬからハッ
シュ値を計算する。ハッシュ値の計算には典型的にはｍ
ｏｄ演算が用いられ、ＵＲＬをそのハッシュ値に基づい
て複数個のグループに分類する。例えば、文字Ｕｉ（ｉ
＝１…ｎ）がＵＲＬ文字列のｉ番目の文字を表すとし
て、変数ＳＵＭを次のようにして計算する。In this method, a hash value is calculated from a given URL. The calculation of the hash value is typically m
An od operation is used to classify URLs into a plurality of groups based on their hash values. For example, the characters Ui (i
= 1... N) represents the i-th character of the URL character string, and the variable SUM is calculated as follows.

【００１０】初期値としてＳＵＭ＝０とする。以下の計
算をｉ＝１〜ｎに対して繰返す。It is assumed that SUM = 0 as an initial value. The following calculation is repeated for i = 1 to n.

【００１１】[0011]

【数１】ＳＵＭ＝ＳＵＭ×５＋Ｕｉｉ＝ｎまで計算が完了したら、ＳＵＭの下位３２ビット
をハッシュコード（ハッシュ値）とする。下位３２ビッ
トのみをハッシュコードとすることで、ｍｏｄ演算が行
われ、ハッシュコードに基づいてＵＲＬが分類される。When the calculation is completed up to SUM = SUM × 5 + Ui i = n, the lower 32 bits of SUM are used as a hash code (hash value). By using only the lower 32 bits as a hash code, a mod operation is performed, and the URL is classified based on the hash code.

【００１２】こうして計算されたハッシュコードにした
がって各ＵＲＬを振り分ける。つまり、履歴リストは各
ハッシュコードのリストとなり、各ハッシュコードには
そのハッシュコードをもつＵＲＬがサブリストとして連
結される。各ＵＲＬには、そのＵＲＬに対応したファイ
ルの固定ディスクにおける格納アドレスが付加される。Each URL is allocated according to the hash code thus calculated. That is, the history list is a list of each hash code, and the URL having the hash code is linked to each hash code as a sublist. To each URL, a storage address of a file corresponding to the URL on a fixed disk is added.

【００１３】ＵＲＬが与えられると、まずそのハッシュ
コードが前述の式にしたがって計算される。そして、履
歴リスト内の、計算されたハッシュコードに連結された
サブリストをたどり、そのサブリスト内に目的のＵＲＬ
が存在するかどうかを調べる。当該ＵＲＬが存在する場
合にはそのＵＲＬに付されていた格納アドレスを用いて
固定ディスクをアクセスし目的のファイルを取り出して
表示する。なければ履歴中に存在しないものとしてイン
ターネット上で目的ＵＲＬをアクセスする。Given a URL, the hash code is first calculated according to the above equation. Then, the sub list linked to the calculated hash code in the history list is traced, and the target URL is included in the sub list.
Checks if exists. If the URL exists, the fixed disk is accessed using the storage address assigned to the URL, and a target file is extracted and displayed. If not, the target URL is accessed on the Internet as if it does not exist in the history.

【００１４】こうした２段階の履歴リストを用いること
で、ＵＲＬの検索のための文字列の比較が、最大でも、
一つのハッシュコードに連結されたサブリストの要素の
数となるので、履歴を順次に保存しておく場合と比較し
て比較の回数が大幅に減る。By using such a two-stage history list, character string comparison for URL search can be performed at most.
Since this is the number of elements of the sublist linked to one hash code, the number of comparisons is greatly reduced as compared with the case where the histories are sequentially stored.

【００１５】[0015]

【発明が解決しようとする課題】このようなハッシュを
用いた分類を使用するときには、各ハッシュコードごと
にそのハッシュコード値を持つＵＲＬの数が均等になる
のが望ましい。ところが、ＵＲＬについてはハッシュコ
ードを用いても均等に分類されないという問題点がある
ことがわかった。これは次のような要因による。When such a classification using a hash is used, it is desirable that the number of URLs having the hash code value is equal for each hash code. However, it has been found that there is a problem that URLs are not uniformly classified even if a hash code is used. This is due to the following factors.

【００１６】ＵＲＬは、上記したようにプロトコル名
と、サーバ名と、ファイル名との組み合わせである。と
ころが、プロトコルの種類は限られており特にブラウザ
プログラムがアクセスするときにはほとんどの場合ｈｔ
ｔｐプロトコルが用いられるから、ＵＲＬ文字列のうち
のプロトコルを表す部分についてはほとんどすべてが
「http:// 」となる。同じ文字列がＵＲＬ中の同じ部分
にあると、上記した式にしたがって計算した場合にはこ
の部分から得られるハッシュコードは同一となる。The URL is a combination of the protocol name, the server name, and the file name as described above. However, the types of protocols are limited and especially when accessed by a browser program, in most cases, ht
Since the tp protocol is used, almost all portions of the URL character string representing the protocol are "http: //". If the same character string is in the same part in the URL, the hash code obtained from this part will be the same when calculated according to the above equation.

【００１７】またサーバ名のうち、多くの場合先頭部分
も各サービスを表す文字列に固定されているのが通常で
ある。たとえば「www 」である。すると、この部分でも
ハッシュコードの計算において差は生じない。In many cases, the head of the server name is usually fixed to a character string representing each service. For example, "www". Then, even in this part, no difference occurs in the calculation of the hash code.

【００１８】さらに、同一のドメインに存在するデータ
のＵＲＬはその大部分が共通で、一部分しか相違してい
ないことが多い。そもそも、同一のドメイン内では、Ｕ
ＲＬのうちのドメイン名の部分が同一となる。この場合
にもハッシュコードの計算において差が生じない。Further, the URLs of data existing in the same domain are mostly the same, and there are many cases where only a part is different. In the first place, within the same domain, U
The domain name portion of the RL is the same. Also in this case, no difference occurs in the calculation of the hash code.

【００１９】その結果、ＵＲＬの文字列中に出現する文
字の並びに偏りがあるので、ハッシュコードによるＵＲ
Ｌの分類にも偏りが生じるという問題点がある。As a result, there is a bias in the arrangement of characters appearing in the character string of the URL.
There is a problem that the classification of L is also biased.

【００２０】このようにＵＲＬの分類に偏りが生じる
と、与えられたＵＲＬ文字列をそのＵＲＬに対して計算
されたハッシュコードと同一のハッシュコードを持つ多
数のＵＲＬ文字列と比較する必要が生じる。この場合に
は文字列をその先頭から順次比較して一致しない部分が
発見されてはじめて次のＵＲＬ文字列との比較が行われ
る。ところが、たとえば同一ドメインに属するデータの
ＵＲＬ文字列はその先頭から大部分が等しく、異なる部
分は最後の何文字かだけであるという場合が多く、その
場合には先頭から多数の文字を比較して最後に近い部分
になってはじめて相違が認識されるので、ＵＲＬ文字列
ごとに各文字の比較を多数回繰返す必要が生じる。その
ため、ハッシュコードによるばらつきが効率的に行われ
ない場合には、比較対象となるＵＲＬの数自体が多くな
ることとあいまって、検索を非効率的にしている。As described above, when the URL classification is biased, it is necessary to compare a given URL character string with a large number of URL character strings having the same hash code as the hash code calculated for the URL. . In this case, the character string is compared with the next URL character string only when a mismatch is found by sequentially comparing the character strings from the beginning. However, for example, URL character strings of data belonging to the same domain are mostly equal from the beginning, and different parts are often only the last few characters. In such a case, many characters from the beginning are compared. Since the difference is recognized only at the part near the end, it is necessary to repeat the comparison of each character many times for each URL character string. Therefore, when the variation due to the hash code is not performed efficiently, the search becomes inefficient because the number of URLs to be compared increases.

【００２１】これを避けるためには、ハッシュコードを
計算するためのハッシュ関数をより複雑なものとしてハ
ッシュコードを効率的にばらつかせる必要がある。しか
しそれでも、同一のハッシュコード内で直接比較する場
合に、比較の対象となる文字列が長くなるという問題を
解決することはできない。また、関数が複雑であれば処
理に要する時間も長くなる。In order to avoid this, it is necessary to make the hash code for calculating the hash code more complicated so that the hash codes are efficiently dispersed. However, even in this case, it is not possible to solve the problem that a character string to be compared becomes longer when directly comparing in the same hash code. Further, if the function is complicated, the time required for the processing also becomes longer.

【００２２】こうした問題は、インターネットのＵＲＬ
をキャッシュするためのブラウザに限らず、これと同様
の性質をもったデータの格納場所をハッシュにより定め
る場合にも遭遇する問題である。また、こうしたハッシ
ングを行うときには、そのために必要なメモリ領域をな
るべく節約し、かつハッシング計算も高速で行うことが
できるようにしたようが好ましい。These problems are caused by the URL of the Internet.
The problem is not limited to browsers for caching data, but also occurs when data storage locations having similar properties are determined by hashing. When such hashing is performed, it is preferable that the memory area necessary for the hashing be saved as much as possible and that the hashing calculation can be performed at high speed.

【００２３】それゆえにこの発明の目的は、ＵＲＬのよ
うに、文字の並びの出現頻度に偏りがあるような文字列
に対して効率的にハッシングが行なえる、文字列に対す
るハッシュ値の計算方法を提供することである。SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a hash value calculation method for a character string, such as a URL, in which hashing can be efficiently performed on a character string in which the appearance frequency of a character sequence is biased. To provide.

【００２４】[0024]

【課題を解決するための手段】請求項１に記載の発明に
かかる方法は、偏った頻度で文字の並びが出現する処理
対象の文字列に対するハッシュ値の計算方法であって、
特定の文字列を、より短い長さの変換後文字列と一意に
関連付けるための機械可読なテーブルを準備するステッ
プと、コンピュータを用いて、処理対象文字列中に出現
する文字列を、テーブルを参照して対応の変換後文字列
に変換するステップと、コンピュータを用いて、変換後
文字列を含む処理対象文字列に基づいてハッシュ値を計
算するステップとを含む。According to a first aspect of the present invention, there is provided a method of calculating a hash value for a character string to be processed in which a character sequence appears at an uneven frequency,
Preparing a machine-readable table for uniquely associating a specific character string with a shorter-length converted character string; and, using a computer, converting the character string appearing in the character string to be processed into a table. Referencing and converting to a corresponding converted character string; and using a computer to calculate a hash value based on the processing target character string including the converted character string.

【００２５】変換後文字列は、変換前の文字列と比較し
て文字列長が短くなる。そのためハッシュ値の計算が高
速で行なえ、かつ文字列を記憶しておく領域の容量が少
なく済む。また、ハッシュ値が同一の場合には文字列を
直接比較する必要があるが、変換後の短い文字列が比較
の対象となるので比較を高速に行うことができる。The character string length after conversion is shorter than that before conversion. Therefore, the calculation of the hash value can be performed at high speed, and the capacity of the area for storing the character string can be reduced. If the hash values are the same, the character strings need to be directly compared. However, since the converted short character strings are to be compared, the comparison can be performed at high speed.

【００２６】請求項２に記載の発明にかかる方法は、請
求項１に記載の発明の構成に加えて、テーブルを準備す
るステップは、過去に出現した処理対象文字列をコンピ
ュータ中に準備するステップと、過去に出現した処理対
象文字列の部分文字列の各々の、出現回数と文字列長と
を、コンピュータを用いて集計するステップと、集計さ
れた出現回数と文字列長とに基づき、過去に出現した処
理対象文字列の部分文字列のうち、所定文字に置換した
ときに過去に出現した処理対象文字列を最も効率的に圧
縮することが可能な部分文字列を選択しテーブルに追加
するステップと、選択された部分文字列を考慮して出現
回数を再計算し、さらに、追加するステップを所定の条
件が成立するまで繰返すステップとを含む。According to a second aspect of the present invention, in addition to the configuration of the first aspect, the step of preparing a table includes the step of preparing a character string to be processed that has appeared in the past in a computer. And, using a computer, totalizing the number of appearances and the character string length of each of the partial character strings of the processing target character string that appeared in the past. Based on the totaled number of appearances and the character string length, Among the partial character strings of the processing target character string that appeared in the above, select the partial character string that can most efficiently compress the processing target character string that appeared in the past when it was replaced with a predetermined character, and added it to the table And a step of recalculating the number of appearances in consideration of the selected partial character string, and repeating the step of adding until a predetermined condition is satisfied.

【００２７】過去に出現した処理対象文字列について、
その部分文字列の出現回数と文字列長とを集計すること
により、各部分文字列を所定文字列に置換したときに得
られる圧縮量を計算できる。この圧縮量に基づいてテー
ブルにあげるべき文字列を選択することで、効果的に処
理対象文字列を圧縮しハッシュの計算が行なえるように
なる。For a character string to be processed that has appeared in the past,
By compiling the number of appearances of the partial character string and the character string length, the compression amount obtained when each partial character string is replaced with a predetermined character string can be calculated. By selecting a character string to be listed in the table based on the amount of compression, the character string to be processed can be effectively compressed and a hash can be calculated.

【００２８】請求項３に記載の発明にかかる記録媒体
は、偏った頻度で文字の並びが出現する処理対象の文字
列に対するハッシュ値の計算方法を実現するプログラム
を記録した機械可読な記録媒体であって、プログラム
は、特定の文字列を、より短い長さの変換後文字列と一
意に関連付けるための機械可読なテーブルを準備するス
テップと、処理対象文字列中に出現する文字列を、テー
ブルを参照して対応の変換後文字列に変換するステップ
と、変換後文字列を含む処理対象文字列に基づいてハッ
シュ値を計算するステップとを含む。A recording medium according to a third aspect of the present invention is a machine-readable recording medium storing a program for implementing a method of calculating a hash value for a character string to be processed in which a character sequence appears at an uneven frequency. Thereupon, the program includes a step of preparing a machine-readable table for uniquely associating a specific character string with a shorter-length converted character string, and converting the character string appearing in the processing target character string into a table. And converting to a corresponding converted character string, and calculating a hash value based on the processing target character string including the converted character string.

【００２９】変換後文字列は、変換前の文字列と比較し
て文字列長が短くなる。そのためハッシュ値の計算が高
速で行なえ、かつ文字列を記憶しておく領域の容量が少
なく済む。The length of the converted character string is shorter than that of the character string before conversion. Therefore, the calculation of the hash value can be performed at high speed, and the capacity of the area for storing the character string can be reduced.

【００３０】請求項４に記載の発明にかかる記録媒体
は、請求項３に記載の発明の構成に加えて、テーブルを
準備するステップは、過去に出現した処理対象文字列を
準備するステップと、過去に出現した処理対象文字列の
部分文字列の各々の、出現回数と文字列長とを集計する
ステップと、集計された出現回数と文字列長とに基づ
き、過去に出現した処理対象文字列の部分文字列のう
ち、所定文字に置換したときに過去に出現した処理対象
文字列を最も効率的に圧縮することが可能な部分文字列
を選択しテーブルに追加するステップと、選択された部
分文字列を考慮して出現回数を再計算し、さらに、追加
するステップを所定の条件が成立するまで繰返すステッ
プとを含む。According to a fourth aspect of the present invention, in the recording medium according to the third aspect, the step of preparing the table includes the step of preparing a character string to be processed that has appeared in the past; A step of totalizing the number of appearances and the character string length of each of the partial character strings of the processing target character string that appeared in the past, and the processing target character string that appeared in the past based on the totaled number of appearances and the character string length Selecting a partial character string capable of most efficiently compressing a processing target character string that appeared in the past when the character string is replaced with a predetermined character, and adding the selected partial character string to the table; Recalculating the number of appearances in consideration of the character string, and repeating the adding step until a predetermined condition is satisfied.

【００３１】過去に出現した処理対象文字列について、
その部分文字列の出現回数と文字列長とを集計すること
により、各部分文字列を所定文字列に置換したときに得
られる圧縮量を計算できる。この圧縮量に基づいてテー
ブルに含ませるべき文字列を選択することで、効果的に
処理対象文字列を圧縮しハッシュの計算が行なえるよう
になる。For a character string to be processed that has appeared in the past,
By compiling the number of appearances of the partial character string and the character string length, the compression amount obtained when each partial character string is replaced with a predetermined character string can be calculated. By selecting a character string to be included in the table based on the compression amount, the character string to be processed can be effectively compressed and the hash can be calculated.

【００３２】[0032]

【発明の実施の形態】［第１の実施の形態］図１を参照
して、本願発明の第１の実施の形態にかかる方法は、ブ
ラウザ２０によるキャッシュファイル領域２４の管理に
おいて、ブラウザ２０から与えられたＵＲＬを後述する
方法にしたがって圧縮し、圧縮したＵＲＬに対してハッ
シュ計算を行って、圧縮後の当該ＵＲＬおよびそのＵＲ
Ｌに対応するキャッシュファイルのアドレスの組からな
るハッシュレコード４２を、ハッシュメモリ２６内の、
当該ＵＲＬに対して計算されたハッシュ領域４０に格納
する処理を行うＵＲＬ圧縮装置２２により実現される。
なお本実施の形態はブラウザによるＵＲＬのアクセス履
歴の管理について述べるが、本発明はこれに限らず、文
字列をキーとしてハッシングを行い、そのハッシュ値に
基づいてレコードを格納したり検索したりするシステム
全般に適用することができる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS [First Embodiment] Referring to FIG. 1, a method according to a first embodiment of the present invention is a method for managing a cache file area 24 by a browser 20. The given URL is compressed according to a method described later, a hash calculation is performed on the compressed URL, and the compressed URL and its URL are compressed.
A hash record 42 consisting of a set of addresses of a cache file corresponding to L is stored in the hash memory 26
This is realized by the URL compression device 22 that performs a process of storing the URL in the hash area 40 calculated.
Although this embodiment describes management of URL access history by a browser, the present invention is not limited to this, and hashing is performed using a character string as a key, and records are stored or searched based on the hash value. It can be applied to the whole system.

【００３３】ＵＲＬ圧縮装置２２は、実際にはコンピュ
ータ上で実行されるソフトウェアにより実現されるが、
ＵＲＬを置換する際に使用される、置換前後の文字列の
テーブルである文字列リスト５０と、過去のアクセス履
歴ファイル４６に基づいて文字列リスト５０を構築する
ためのリスト作成処理５６と、ブラウザ２０から与えら
れたＵＲＬに含まれる部分文字列を、文字列リスト５０
を参照して、より短い長さの所定のコード（本実施の形
態では１バイトのコード）に置換するための文字列置換
処理５２と、文字列置換処理５２によって部分文字列を
それぞれ所定のコードに置換したＵＲＬに基づいてハッ
シュ計算を行って、計算により得られたハッシュ値にし
たがってハッシュメモリ２６を維持・管理するためのハ
ッシュ計算処理５４とを含む。The URL compression device 22 is actually realized by software executed on a computer.
A character string list 50, which is a table of character strings before and after replacement, used when replacing URLs; a list creation process 56 for constructing the character string list 50 based on the past access history file 46; The partial character string included in the URL given from the
, A character string replacement process 52 for replacing a shorter code with a predetermined code (a one-byte code in the present embodiment), and the character string replacement process 52 And a hash calculation processing 54 for maintaining and managing the hash memory 26 in accordance with the hash value obtained by the calculation.

【００３４】図２を参照して、文字列リスト５０は、置
換の対象となる文字列（左欄）と、置換後の文字列（右
欄）とを組とし、この組を３０個含んだものである。図
２に示される例では、「http://www. 」という文字列が
「１」に置換され、「http:// 」という文字列が「２」
に置換され、「.co.jp./」という文字列が「３」に置換
され、以下同様である。Referring to FIG. 2, character string list 50 includes a character string to be replaced (left column) and a character string after replacement (right column), and includes 30 such sets. Things. In the example shown in FIG. 2, the character string “http: // www.” Is replaced with “1”, and the character string “http: //” is replaced with “2”.
And the character string “.co.jp. /” Is replaced with “3”, and so on.

【００３５】一般的に通常のパーソナルコンピュータで
はＡＳＣＩＩ（American StandardCode for Informatio
n Interchange ）コードを用いている。しかし、実際
に文字として使用されるのはＡＳＣＩＩコードで「３
２」以上である。そこで、圧縮後のコードとしてコード
１〜３１を用いれば、あるハッシュレコード４２のうち
のＵＲＬ文字列のうちのあるバイトが、もともとのＵＲ
Ｌに含まれていた文字か、文字列置換処理５２によって
置換されたコードかを、そのコードの値から判別するこ
とができる。この実施の形態では置換後のコードとして
１〜３０を用いている。Generally, an ordinary personal computer uses ASCII (American Standard Code for Informatio).
n Interchange) code. However, the characters actually used as characters are ASCII codes "3
2 "or more. Therefore, if codes 1 to 31 are used as codes after compression, a certain byte of the URL character string of a certain hash record 42 is replaced with the original URL.
It can be determined from the value of the code whether the character has been included in L or the code has been replaced by the character string replacement process 52. In this embodiment, 1 to 30 are used as codes after replacement.

【００３６】図３に、文字列リスト５０を用いて文字列
を置換する前のＵＲＬ文字列（左欄）と、置換後のＵＲ
Ｌ文字列との例を示す。図３において、左欄の文字列の
うち「http://www. 」の部分は図２の文字列のうち１行
目の左欄と一致する。図２によれば、この文字列はＡＳ
ＣＩＩコードの「１」に置換される。また、図３の左欄
の文字列において、最後の「.co.jp/ 」の文字列は図２
の表では３行目の左欄と一致し、対応のＡＳＣＩＩコー
ドは「３」である。したがって「http://www.sharp.co.
jp/ 」は図３の右欄に示すとおり「[1]sharp[3] 」に変
換される（コードであることを角かっこで示す）。FIG. 3 shows a URL character string before substitution of a character string using the character string list 50 (left column) and a URL character string after substitution.
An example with an L character string is shown. In FIG. 3, the portion of “http: // www.” In the character string in the left column matches the left column in the first line of the character string in FIG. According to FIG. 2, this string is AS
Replaced by "1" in the CII code. In the character string in the left column of FIG. 3, the last character string of “.co.jp /” is
In the table above, the value matches the left column of the third row, and the corresponding ASCII code is “3”. Therefore, `` http://www.sharp.co.
"jp /" is converted to "[1] sharp [3]" as shown in the right column of FIG. 3 (the code is indicated by square brackets).

【００３７】このようにして部分文字列を文字列リスト
５０を参照しながら可能なかぎり対応のＡＳＣＩＩコー
ドに変換したのち、変換後の文字列に対してハッシュ計
算処理５４を行う。そして、計算されたハッシュコード
にしたがって当該ＵＲＬを各ハッシュ領域４０に振り分
けて、ＵＲＬの保存であれば置換後のＵＲＬと対応のキ
ャッシュファイルのディスクアドレスとをハッシュ領域
４０に格納し、ＵＲＬから対応のキャッシュファイルの
ディスク格納領域を検索する場合であれば対応のハッシ
ュ領域４０内で当該置換後のＵＲＬ文字列を検索したの
ち、付加されているディスクアドレスをブラウザ２０に
返す。After converting the partial character string into the corresponding ASCII code as much as possible with reference to the character string list 50 in this way, the hash calculation processing 54 is performed on the converted character string. Then, the URL is allocated to each hash area 40 according to the calculated hash code, and if the URL is stored, the replaced URL and the disk address of the corresponding cache file are stored in the hash area 40, and the corresponding URL is stored. In the case of searching the disk storage area of the cache file, the URL character string after the replacement is searched in the corresponding hash area 40, and the added disk address is returned to the browser 20.

【００３８】このようにすることにより次の効果が生ず
る。まず、ハッシュ計算が、部分文字列を対応のより短
いコードに置換した後のＵＲＬに対して行われるため、
ハッシュ計算の対象となる文字列そのものが短くなりハ
ッシュ計算の計算量が減少する。特に、ハッシュ計算の
ばらつきを効率的にするためにハッシュ計算式として複
雑なものを選んだ場合に、計算量の増大を押さえること
ができる。そのためハッシュ計算をより処理を高速化で
きる。また、同様のばらつきを実現しようとする場合、
対象となる文字列が短くなっているので、ハッシュ計算
式としてそれほど複雑な式を使用しなくともよいという
効果もある。By doing so, the following effects occur. First, since the hash calculation is performed on the URL after replacing the substring with the corresponding shorter code,
The character string to be subjected to hash calculation itself becomes shorter, and the calculation amount of hash calculation decreases. In particular, when a complicated hash calculation formula is selected in order to make variations in hash calculation more efficient, an increase in the amount of calculation can be suppressed. Therefore, the hash calculation can be further speeded up. Also, if you want to achieve the same variation,
Since the target character string is shortened, there is an effect that it is not necessary to use a complicated expression as a hash calculation expression.

【００３９】また、ハッシュ領域４０のハッシュレコー
ド４２に格納されるＵＲＬは置換後のより短い文字列と
なっているので、同じ容量のハッシュ領域４０であれば
より多くの数のＵＲＬを格納することができる。また
は、同じ数のＵＲＬを格納するために必要なハッシュ領
域４０の容量が少なくて済む。すなわち、ハッシュ領域
４０のための記憶領域を有効に利用することができる。
また、同一のハッシュ領域４０内でＵＲＬ文字列の比較
を行わなければならない場合でも、比較の対象となる文
字列の長さが短いので、比較が高速に行なえるという効
果を奏する。Further, since the URL stored in the hash record 42 of the hash area 40 is a shorter character string after replacement, it is necessary to store a larger number of URLs in the hash area 40 having the same capacity. Can be. Alternatively, the capacity of the hash area 40 required to store the same number of URLs can be reduced. That is, the storage area for the hash area 40 can be effectively used.
Further, even when URL character strings must be compared in the same hash area 40, the length of the character string to be compared is short, so that the comparison can be performed at high speed.

【００４０】前述のようにＵＲＬ圧縮装置２２は実際に
は、パーソナルコンピュータまたはワークステーション
など、コンピュータ上で実行されるソフトウェアにより
実現される。図４に、文字列に対するハッシュ値の計算
方法を実現するコンピュータの外観を示す。図４を参照
してこのコンピュータ１２０は、ＣＤ−ＲＯＭ（Compac
t Disc Read-Only Memory ）駆動装置１４４およびＦＤ
（Flexible Disk ）駆動装置１４２とを備えたコンピュ
ータ本体１３０と、モニタ１４８と、プリンタ１４６
と、キーボード１３６と、マウス１３４とを含む。As described above, the URL compression device 22 is actually realized by software executed on a computer such as a personal computer or a workstation. FIG. 4 shows the appearance of a computer that realizes a method of calculating a hash value for a character string. Referring to FIG. 4, this computer 120 has a CD-ROM (Compac
t Disc Read-Only Memory) Drive 144 and FD
(Flexible Disk) A computer main body 130 having a driving device 142, a monitor 148, and a printer 146.
, A keyboard 136, and a mouse 134.

【００４１】図５に、このコンピュータ１２０の構成を
ブロック図形式で示す。図５に示されるようにコンピュ
ータ本体１３０は、ＦＤ駆動装置１４２およびＣＤ−Ｒ
ＯＭ駆動装置１４４に加えて、相互にバスで接続された
ＣＰＵ１３２（Central Processing Unit ）と、メモリ
１３８と、固定ディスク１４０とを含んでいる。ＣＤ−
ＲＯＭ駆動装置１４４にはＣＤ−ＲＯＭ１５２が装着さ
れる。ＦＤ駆動装置１４２にはＦＤ１５０が装着され
る。FIG. 5 shows the configuration of the computer 120 in a block diagram form. As shown in FIG. 5, the computer main unit 130 includes an FD driving device 142 and a CD-R.
In addition to the OM drive device 144, a CPU 132 (Central Processing Unit), a memory 138, and a fixed disk 140 are connected to each other by a bus. CD-
The CD-ROM 152 is mounted on the ROM drive 144. The FD 150 is mounted on the FD driving device 142.

【００４２】既に述べたようにこの文字列に対するハッ
シュ値の計算方法は、コンピュータハードウェアと、Ｃ
ＰＵ１３２により実行されるソフトウェアとにより実現
される。一般的にこうしたソフトウェアは、ＣＤ−ＲＯ
Ｍ１５２、ＦＤ１５０などの記憶媒体に格納されて流通
し、ＣＤ−ＲＯＭ駆動装置１４４またはＦＤ駆動装置１
４２などにより記憶媒体から読取られて固定ディスク１
４０に一旦格納される。さらに固定ディスク１４０から
メモリ１３８に読出されてＣＰＵ１３２により実行され
る。図４および図５に示したコンピュータのハードウェ
ア自体は一般的なものである。したがって、本発明の最
も本質的な部分はＣＤ−ＲＯＭ１５２、ＦＤ１５０、固
定ディスク１４０などの記憶媒体に記憶されたソフトウ
ェアである。As described above, the method of calculating the hash value for this character string is based on computer hardware, C
This is realized by software executed by the PU 132. Generally, such software is a CD-RO
The CD-ROM drive 144 or the FD drive 1 is stored in a storage medium such as M152 or FD150 and distributed.
Fixed disk 1 read from the storage medium
40 temporarily stored. Further, the data is read from the fixed disk 140 to the memory 138 and executed by the CPU 132. The hardware itself of the computer shown in FIGS. 4 and 5 is general. Therefore, the most essential part of the present invention is the software stored in the storage medium such as the CD-ROM 152, the FD 150, and the fixed disk 140.

【００４３】なお図４および図５に示したコンピュータ
自体の動作は周知であるので、ここではその詳細な説明
は繰返さない。Since the operation of the computer itself shown in FIGS. 4 and 5 is well known, detailed description thereof will not be repeated here.

【００４４】図６を参照して、図１に示したリスト作成
処理５６の詳細について説明する。なお、本実施の形態
の装置では過去のアクセス履歴ファイル４６から、置換
前の文字列と置換後の文字列とを一意に対応つけるテー
ブルである文字列リスト５０を作成するが、これは、過
去の履歴を用いれば、最もＵＲＬの文字列の置換の効率
がよくなるように文字列リスト５０を作成することが可
能と考えられるためである。ただし、このようにアクセ
ス履歴ファイル４６が準備できない場合には理論的に考
えて文字列リスト５０を手作業で作成してもよい。ま
た、他のサイトでのアクセス履歴から作成された文字列
リスト５０を用いるようにしてもよい。文字列リスト５
０が、ブラウザ２０により参照されるＵＲＬのうちに比
較的高い頻度で出現するできるだけ長い文字列を短いデ
ータに置き換えられるように文字列リスト５０を用意す
ればよい。Referring to FIG. 6, the details of list creation processing 56 shown in FIG. 1 will be described. In the apparatus according to the present embodiment, a character string list 50 which is a table for uniquely associating a character string before replacement with a character string after replacement is created from the past access history file 46. This is because it is considered that the character string list 50 can be created so that the efficiency of the replacement of the character string of the URL is maximized by using the history. However, when the access history file 46 cannot be prepared, the character string list 50 may be manually created theoretically. Alternatively, a character string list 50 created from access histories at other sites may be used. String list 5
The character string list 50 may be prepared so that 0 can be replaced with short data as long as possible in a URL referred to by the browser 20 with relatively high frequency.

【００４５】また、本実施の形態では文字列リスト５０
を準備するために最初の一度だけアクセス履歴ファイル
４６に基づいてリスト作成処理５６を行うものとしてい
るが、稼動を開始したのちその実績に基づいてリスト作
成処理５６を随時行い文字列リスト５０を作成しなおす
ようにしてもよい。ただしその場合には、作りなおす前
の文字列リスト５０に基づいて作成されたハッシュメモ
リ２６の内容を、作り直した後の文字列リスト５０に合
わせて作りなおさなければならないことは勿論である。In this embodiment, the character string list 50 is used.
In order to prepare the character string list 50, the list creation processing 56 is performed based on the access history file 46 only once at the first time based on the access history file. You may try again. However, in this case, it is needless to say that the contents of the hash memory 26 created based on the character string list 50 before re-creation must be re-created according to the character string list 50 after re-creation.

【００４６】図６を参照して、まずＵＲＬ中の部分文字
列の出現回数を各部分文字列ごとに集計する（２０
０）。このとき、ＵＲＬを単位とするだけでなく、ＵＲ
Ｌに含まれる部分文字列までも含め、各文字列がアクセ
ス履歴ファイル４６中に何回出現したかを各文字列長と
ともに集計する。集計する部分文字列の長さは置換後の
コードの長さよりも長ければよいので、この実施の形態
においては３文字以上の部分文字列について全て集計す
ることとしている。この集計の結果、３文字以上の文字
列と、その文字列の文字列長と、アクセス履歴ファイル
４６内におけるその文字列の出現回数とが全て集計され
る。Referring to FIG. 6, first, the number of appearances of the partial character strings in the URL is totaled for each partial character string (20).
0). At this time, not only the URL is used as a unit, but also the URL
The number of times each character string appears in the access history file 46, including the partial character strings included in L, is counted together with the length of each character string. Since the length of the partial character string to be totaled need only be longer than the length of the code after the replacement, in this embodiment, all the partial character strings of three or more characters are totaled. As a result of the tallying, the character string of three or more characters, the character string length of the character string, and the number of appearances of the character string in the access history file 46 are all totaled.

【００４７】続いて変数ｉに０を代入する（２０２）。
変数ｉは、文字列リスト５０にリストされた文字列の数
をカウントするための変数であり、ここでその初期値を
代入している。Subsequently, 0 is substituted for the variable i (202).
The variable i is a variable for counting the number of character strings listed in the character string list 50, and its initial value is substituted here.

【００４８】次に、変数ｉに１を加算し（２０４）、変
数ｉの値が、文字列リスト５０の最大文字列数として予
め定められた数（本実施の形態では３０）よりも大きい
か否かを判定する（２０６）。もしも判定結果がＹＥＳ
なら処理を終了する。判定結果がＮＯであれば制御はス
テップ２０８に進む。Next, 1 is added to the variable i (204), and the value of the variable i is larger than a predetermined number (30 in the present embodiment) as the maximum number of character strings in the character string list 50. It is determined whether or not it is (206). If the judgment result is YES
If so, the process ends. If the determination result is NO, the control proceeds to step 208.

【００４９】ステップ２０８では、ステップ２００で得
られた集計表のうち、最も高い圧縮効果が得られる部分
文字列を選択する。圧縮効果は、たとえば以下の式にし
たがって求められる。In step 208, a partial character string that provides the highest compression effect is selected from the tabulation table obtained in step 200. The compression effect is obtained, for example, according to the following equation.

【００５０】[0050]

【数２】総圧縮長＝（文字列長−圧縮後のサイズ）×出
現回数こうして計算された総圧縮長が最も大きい部分文字列を
ステップ２０８で選択し、ステップ２１０で文字列リス
ト５０に追加する。そして、この部分文字列をもとの集
計表から削除する（２１２）。たとえば、集計表が図７
に示されるようなものである場合を考える。この場合、
上述の式にしたがって計算した総圧縮長が最も大きくな
るのは、「http://www」である（総圧縮長＝（１０−
１）×８００＝７２００）。したがって、「http://ww
w」を文字列リスト５０に追加して集計表から削除す
る。## EQU2 ## Total compressed length = (character string length-size after compression) .times. Number of appearances A partial character string having the largest total compressed length calculated in this way is selected in step 208 and added to the character string list 50 in step 210. I do. Then, this partial character string is deleted from the original tabulation table (212). For example, if the summary table is
Consider the case as shown in FIG. in this case,
The largest total compression length calculated according to the above equation is "http: // www" (total compression length = (10-
1) × 800 = 7200). Therefore, "http: // ww
"w" is added to the character string list 50 and deleted from the summary table.

【００５１】このとき、この部分文字列が削除されたこ
とにより、もとの集計表内の各文字列のうち、削除され
た部分文字列に含まれている文字列についてもそれぞれ
当該部分文字列の出現回数分だけその出現回数を減算し
集計表を再計算する（２１４）。図７に示される例で
は、「http://www」の部分文字列であって図７の表にリ
ストされているのは「http:// 」である。「http://ww
w」の出現回数が図７によれば８００回であったから、
「http:// 」の出現回数は、もとの「９００」から８０
０を減算した「１００」となる（図８）。At this time, since the partial character strings are deleted, the character strings included in the deleted partial character strings among the character strings in the original totaling table are also respectively included in the partial character strings. The number of appearances is subtracted by the number of occurrences of, and the summary table is recalculated (214). In the example shown in FIG. 7, "http: //" is a partial character string of "http: // www" and is listed in the table of FIG. "Http: // ww
Since the number of appearances of “w” was 800 according to FIG. 7,
The number of occurrences of “http: //” is 80 from the original “900”
It becomes “100” obtained by subtracting 0 (FIG. 8).

【００５２】こうして集計表を再計算した後制御はステ
ップ２０４に戻り、以下ステップ２０４からステップ２
１４までを、ステップ２０６の判断により処理終了とな
るまで繰返す。もちろんこの途中で集計表に文字列が残
っていない状況となったらその時点で処理を終了すれば
よい。After recalculating the summary table in this manner, the control returns to step 204.
The steps up to 14 are repeated until the processing is completed by the judgment of step 206. Of course, if there is no character string remaining in the tally table during this process, the process may be terminated at that point.

【００５３】次に図９を参照して、ＵＲＬの格納をする
場合の処理について説明する。まず、ブラウザ２０から
ＵＲＬと当該ＵＲＬに対応するキャッシュファイルのデ
ィスクアドレスとを受け取る（２５０）。次に、文字列
リスト５０を読込む（２５２）。Next, a process for storing a URL will be described with reference to FIG. First, the URL and the disk address of the cache file corresponding to the URL are received from the browser 20 (250). Next, the character string list 50 is read (252).

【００５４】以下、ＵＲＬの文字列の先頭から、文字列
リスト５０の各行と一致する文字列があるかどうかを比
較していく。まず、ステップ２５４で、処理対象の文字
が最後の文字であるか否かを判定する。最後の文字であ
れば上述した比較を終了し制御はステップ２６０に進
む。ステップ２６０以下については後述する。Hereinafter, whether or not there is a character string that matches each line of the character string list 50 will be compared from the beginning of the URL character string. First, in step 254, it is determined whether the character to be processed is the last character. If it is the last character, the above-described comparison ends, and control proceeds to step 260. Step 260 and subsequent steps will be described later.

【００５５】ステップ２５４で、処理対象の文字が最後
の文字でないと判定されたときには、ステップ２５６で
この文字から始まる文字列のいずれかが文字列リスト５
０の文字列のいずれかと一致するか否かを判定する。一
致しなければ処理対象を次の文字に進めて制御をステッ
プ２５４に戻す。一致するものがあるときは、当該文字
列を、文字列リスト５０中でその文字列に対応するもの
として示されているコードに置換する（２５８）。その
後処理対象を次の文字に進めて制御はステップ２５４に
戻る。このようにして、入力されたＵＲＬのうちの文字
列を順次コードに置換していく。If it is determined in step 254 that the character to be processed is not the last character, in step 256, any of the character strings starting with this character is displayed in the character string list 5.
It is determined whether any one of the character strings of 0 matches. If they do not match, the processing target is advanced to the next character, and control returns to step 254. If there is a match, the character string is replaced with a code indicated as corresponding to the character string in the character string list 50 (258). Thereafter, the processing target is advanced to the next character, and control returns to step 254. In this way, the character strings in the input URL are sequentially replaced with codes.

【００５６】ステップ２５４の判定結果がＹＥＳとなる
ときには、入力されたＵＲＬのうちコードに置換される
べきものは置換されており、当初のＵＲＬの長さと比較
してかなり短くなっている。この場合ステップ２６０
で、このようにして文字列がコードに変換されたＵＲＬ
に基づいてハッシュが計算される。この場合のハッシュ
計算式は、既に述べたようなものでもよいし、ばらつき
をより均等にするためにより複雑なものであってもよ
い。計算の対象となるＵＲＬ文字列がもとのＵＲＬ文字
列と比較して短くなっているのでハッシュ計算も高速に
行なえる。そのためハッシュ計算を複雑にしても処理速
度が不当に増大することはない。When the result of the determination in step 254 is YES, the input URLs to be replaced with the codes have been replaced, and are considerably shorter than the original URL length. In this case, step 260
Then, the URL in which the character string is converted into the code in this way
The hash is calculated based on The hash calculation formula in this case may be as described above, or may be more complicated in order to make variations more uniform. Since the URL character string to be calculated is shorter than the original URL character string, hash calculation can be performed at high speed. Therefore, even if the hash calculation is complicated, the processing speed does not increase unduly.

【００５７】こうして計算されたハッシュコードに基づ
いて当該ＵＲＬを格納すべきハッシュ領域４０が選択さ
れ、当該領域内のたとえば最後のレコードとしてこの置
換後のＵＲＬと、当該ＵＲＬに対応のキャッシュファイ
ルのディスクアドレスとが追加格納（または更新）され
る（２６２）。The hash area 40 in which the URL is to be stored is selected based on the hash code calculated in this manner, and the URL after the replacement as the last record in the area and the disk of the cache file corresponding to the URL are selected. The address is additionally stored (or updated) (262).

【００５８】図１０を参照して、ブラウザ２０からＵＲ
Ｌの入力を受けて、ハッシュメモリ２６内を検索する場
合の処理について説明する。図１０において、図９と同
一の処理には同一のステップ番号を付してある。それら
各ステップで行われる処理は互いに同じなので、ここで
は説明は繰返さない。図１０が図９と異なるのは、図９
のステップ２６２に代えて、ハッシュコードにしたがっ
て定められたハッシュ領域４０をアクセスして当該ＵＲ
Ｌが存在するか否かを調べるするステップ（２７０）
と、当該ＵＲＬが存在する場合に、そのＵＲＬに付加さ
れているディスクアドレスをブラウザ２０に返し、存在
しない場合には存在していないことを示す情報をブラウ
ザ２０に返す処理を行うステップ２７２とが設けられて
いることである。With reference to FIG.
Processing for searching the hash memory 26 in response to the input of L will be described. In FIG. 10, the same processes as those in FIG. 9 are denoted by the same step numbers. Since the processing performed in each of these steps is the same, the description will not be repeated here. FIG. 10 differs from FIG.
Instead of step 262, the hash area 40 determined according to the hash code is accessed to
Step of checking whether or not L exists (270)
And step 272 of performing a process of returning the disk address added to the URL to the browser 20 when the URL exists, and returning information indicating that the URL does not exist to the browser 20 when the URL does not exist. It is provided.

【００５９】図１０に示される処理によって、ブラウザ
２０は、当該ＵＲＬがキャッシュファイル領域２４に存
在する場合には、返されたディスクアドレスにしたがっ
てそのキャッシュファイルにアクセスできる。当該ＵＲ
Ｌがキャッシュファイル領域２４に存在しない場合に
は、改めてインターネットを介してそのＵＲＬに対して
アクセスを行う。［第２の実施の形態］上述の第１の実施の形態では、置
換後の文字コードとして１〜３０までを使用することと
していた。しかし、これよりもさらに多くの数の文字列
を置換できるようにしておくとさらにハッシュ計算が効
率化できると考えられる。また、この場合にもユーザご
とに設定するものとは別に、全ユーザに共通の置換文字
列を定めて運用できるようにすることが望ましい。そこ
で第２の実施の形態では、共通の置換のための文字列表
と、ユーザごとの文字列表とを別個のものとすることと
した。ただし、使用される文字列リストがこの二種類と
なることを除いて、ソフトウェアは第１の実施の形態に
おけるものと同様である。そこで、以下では文字列リス
トの詳細についてのみ述べる。By the processing shown in FIG. 10, when the URL is present in the cache file area 24, the browser 20 can access the cache file according to the returned disk address. The UR
If L does not exist in the cache file area 24, the URL is accessed again via the Internet. [Second Embodiment] In the first embodiment described above, 1 to 30 are used as character codes after replacement. However, it is considered that the hash calculation can be made more efficient if a larger number of character strings can be replaced. Also in this case, it is desirable that a replacement character string common to all users can be determined and operated separately from what is set for each user. Therefore, in the second embodiment, the character string table for common replacement and the character string table for each user are separated. However, the software is the same as that in the first embodiment, except that these two types of character string lists are used. Therefore, only the details of the character string list will be described below.

【００６０】図１１に、この第２の実施の形態で使用さ
れる、複数のユーザで共通に使用される置換用文字列リ
ストの例を示す。また図１２に、ユーザごとに使用され
る置換用文字列リストの例を示す。FIG. 11 shows an example of a replacement character string list used in the second embodiment and commonly used by a plurality of users. FIG. 12 shows an example of a replacement character string list used for each user.

【００６１】前述のように、ＡＳＣＩＩコード体系を用
いたシステムでは、コード０〜３１は通常は用いられな
い。そこで、第１の実施の形態では、１〜３０までのＡ
ＳＣＩＩコードを置換後のコードとして用いた。しか
し、この場合に置換後の文字コードとして使用可能な文
字コードは最大でも０〜３１までの３２通りしかない。
共通のものに加えて各ユーザごとに置換文字列を定義す
るためには、これだけでは数が不足である。As described above, in a system using the ASCII code system, codes 0 to 31 are not usually used. Therefore, in the first embodiment, A
The SCII code was used as the replacement code. However, in this case, there are only 32 types of character codes that can be used as character codes after replacement, from 0 to 31 at the maximum.
This alone is not enough to define replacement strings for each user in addition to the common ones.

【００６２】そこでこの第２の実施の形態では、第１の
実施の形態と同様に共通の置換文字列リストの置換後文
字コードとして１〜３０を用いる（図１１参照）ととも
に、ユーザごとの文字列リストでは、ＡＳＣＩＩコード
「３１」をエスケープコードとし、コード「３１」とそ
の後の１バイトとによってユーザ定義の置換後文字コー
ドを示すこととした（図１２参照）。このようにエスケ
ープコードを用いることにより、その後の文字コードの
範囲には制限がなくなるから、ユーザ定義の置換後文字
列にはコード「３１＋０」からコード「３１＋２５５」
までの２５６種類を利用することができ、ユーザの状況
に応じて効率的なＵＲＬの管理が可能となる。Therefore, in the second embodiment, as in the first embodiment, 1 to 30 are used as the post-replacement character codes of the common replacement character string list (see FIG. 11), and the characters for each user are used. In the column list, the ASCII code "31" is used as an escape code, and the code "31" and one byte thereafter are used to indicate the user-defined character code after replacement (see FIG. 12). By using the escape code in this manner, the range of the subsequent character codes is not limited, and therefore, the user-defined character string after replacement is changed from the code “31 + 0” to the code “31 + 255”.
Up to 256 types can be used, and efficient URL management can be performed according to the situation of the user.

【００６３】図１３に、図１１および図１２に示される
文字列リストを用いてＵＲＬ文字列を置換した前後の文
字列の組の例を示す。図１３の左欄の１行目に示される
ＵＲＬのうち、文字列「http://www.sharp.co.jp」は図
１２の左欄の１行目に現れている。これに対応する文字
コードは「０」であり、かつその前にはエスケープコー
ド「３１」が必要とされる。また図１３の残りの文字列
「/index.html 」は図１１の最後の行に現れており、対
応の文字コードは「３０」である。しかもこの場合には
エスケープコードは不要である。したがって「http://w
ww.sharp.co.jp/index.html 」は「[31][0][30] 」に置
換されることになる。その結果が図１３の１行目の右欄
に示されている。FIG. 13 shows an example of a set of character strings before and after replacing the URL character string using the character string lists shown in FIGS. 11 and 12. The character string “http://www.sharp.co.jp” in the URL shown on the first line in the left column of FIG. 13 appears on the first line in the left column of FIG. The corresponding character code is "0", and an escape code "31" is required before it. The remaining character string “/index.html” in FIG. 13 appears in the last line of FIG. 11, and the corresponding character code is “30”. Moreover, in this case, no escape code is required. Therefore, "http: // w
ww.sharp.co.jp/index.html ”will be replaced with“ [31] [0] [30] ”. The result is shown in the right column of the first row in FIG.

【００６４】図１３の２行目も同様である。ただしこの
場合、文字列「slab.tnr」については置換できないので
右欄にはそのまま残っている。この場合にも、置換後の
文字列として通常は使用されないコード１〜３１を使用
しているので、置換後の文字列と置換されなかった文字
列とを区別することができる。The same applies to the second row in FIG. However, in this case, the character string “slab.tnr” cannot be replaced, and thus remains in the right column. Also in this case, since codes 1 to 31 which are not usually used are used as the replaced character strings, it is possible to distinguish the replaced character strings from the unreplaced character strings.

【００６５】この第２の実施の形態によっても、第１の
実施の形態と同様の効果を得ることができる上、ユーザ
の状況に応じてより柔軟にＵＲＬの管理を行うことが可
能になり、しかも置換後の文字列として利用可能な文字
コードの数が増えるので、より効率的なＵＲＬの管理を
行うことができる。［第３の実施の形態］以上の第１の実施の形態および第
２の実施の形態のいずれにおいても、使用されていない
ＡＳＣＩＩコードを置換後の文字列に割り当てていた。
しかし本発明はそれには限定されない。たとえばこの第
３の実施の形態におけるように、置換後の文字列として
２けたの数字（「００」から「９９まで」）を用いるこ
ともできる。つまり、置換後の文字列から置換前の文字
列が復元でき（つまり置換文字列と置換後コードとが一
意に対応付けられ）、かつ置換後には置換前よりも文字
列長が確実に短くなっている限りは、置換後の文字列と
してどのような文字列を用いてもよいということであ
る。According to the second embodiment, the same effects as those of the first embodiment can be obtained, and the URL can be more flexibly managed according to the situation of the user. In addition, since the number of character codes that can be used as a character string after replacement increases, more efficient URL management can be performed. [Third Embodiment] In each of the first and second embodiments, an unused ASCII code is assigned to a character string after replacement.
However, the invention is not so limited. For example, as in the third embodiment, a two-digit number ("00" to "99") can be used as a character string after replacement. In other words, the character string before the replacement can be restored from the character string after the replacement (that is, the replacement character string and the code after the replacement are uniquely associated), and the character string length after the replacement is definitely shorter than that before the replacement. This means that any character string may be used as the character string after the replacement.

【００６６】この第３の実施の形態では、図１４に示さ
れるように、置換後の文字列として２けたの数字を使用
している。各数字自体は通常使用されているＡＳＣＩＩ
コードにおける数字と変わりはない。この場合しかし、
置換後の文字列である２けたの数字と、本来の数字とを
区別する必要が生じる。そこで、この第３の実施の形態
では、もともとの１けたの数字を全て２けたの数字で表
すこととし、かつその場合の上位１けたを「０」とする
ことにした（図１４の右欄参照）。つまり１けたの数字
を、上位１けたが「０」で下位１けたがもともとの数字
と等しい２けたの数字に置換することとした。In the third embodiment, as shown in FIG. 14, a two-digit number is used as a character string after replacement. Each number itself is a commonly used ASCII
No change from the numbers in the code. In this case however,
It is necessary to distinguish between the two-digit number that is the character string after the replacement and the original number. Therefore, in the third embodiment, all original one-digit numbers are represented by two-digit numbers, and the upper one digit in that case is set to "0" (the right column in FIG. 14). reference). That is, the one-digit number is replaced with a two-digit number that is equal to the upper one digit but equal to the original digit with the lower one digit being “0”.

【００６７】こうしたルールを定めることで、置換後の
文字列中に数字が見い出された場合、それらを２桁ずつ
取り扱って、図１４の右欄から左欄を参照すれば元の文
字列を復元することができる。By defining such a rule, if a numeral is found in the character string after replacement, it is handled two digits at a time and the original character string is restored by referring to the right to left columns of FIG. can do.

【００６８】図１５に、図１４に示される文字列リスト
を用いた文字列（ＵＲＬ）の置換例を示す。図１５の１
行目の左欄に示される文字列のうち、「http://www. 」
は図１４から「１０」に置換される。「sharp 」は「５
０」に置換される。「.co.jp/ 」は「１２」に置換され
る。したがって全体は「１０５０１２」となる。FIG. 15 shows an example of replacement of a character string (URL) using the character string list shown in FIG. 1 in FIG.
Of the character strings shown in the left column of the line, "http: // www."
Is replaced with “10” from FIG. "Sharp" is "5
0 ". “.Co.jp /” is replaced with “12”. Therefore, the whole is “105012”.

【００６９】また図１５の２行目の左欄に示される文字
列のうち、「http://www.sharp.co.jp/ 」は前述のとお
り「１０５０１２」に置換される。その後の「zaurus/
」は置換ができないが、その後の「index 」は図１４
から「１３」に、「０」は同じく「００」に、「.html
」は同じく「１４」に、それぞれ置換される。したが
って全体は「１０５０１２ｚａｕｒｕｓ／１３００１
４」に変換されることになる。In the character string shown in the left column of the second line of FIG. 15, “http://www.sharp.co.jp/” is replaced with “105012” as described above. Then "zaurus /
Cannot be replaced, but the subsequent "index"
To “13”, “0” to “00”, “.html”
Is also replaced by "14", respectively. Therefore, the whole is “105012zaurus / 13001”
4 ".

【００７０】このように、置換後の文字列をどのように
するか、については様々な方式が考えられる。要は、置
換によって、できるだけ多くのＵＲＬができるだけ短い
文字列に置換されるように、かつそのように置換された
文字列からもとのＵＲＬが間違いなく導出されるよう
に、置換後の文字列を定めればよい。As described above, various methods are conceivable as to what to do with the character string after replacement. In essence, the replacement string should be such that the replacement replaces as many URLs as possible with the shortest possible string, and the original URL is definitely derived from such a replaced string. Should be determined.

【００７１】この第３の実施の形態では、１けたの数字
が２けたの数字に置換されるので、部分的には文字列が
長くなる場合がありうる。しかし、たとえばよく出現す
る非常に長い文字列がわずか２けたの数字に置換できる
ので、全体としては第１の実施の形態および第２の実施
の形態と同様にＵＲＬ文字列を短く圧縮することができ
る。そしてそのように圧縮されたＵＲＬに対してハッシ
ュ計算を行うので、計算量が少なく、かつハッシュメモ
リの領域が少なくて済み、さらに同一のハッシュコード
値の場合の文字列の直接比較も、比較の対象となる文字
列自体が短いので高速に行なえるという効果を奏するこ
とができる。In the third embodiment, since a one-digit number is replaced by a two-digit number, a character string may be partially long. However, for example, a very long character string that frequently appears can be replaced with only two digits, so that the URL character string can be shortened and compressed as a whole similarly to the first and second embodiments. it can. Since the hash calculation is performed on the URL thus compressed, the amount of calculation is small, the area of the hash memory is small, and the direct comparison of character strings in the case of the same hash code value is also possible. Since the target character string itself is short, it is possible to achieve an effect that high-speed operation can be performed.

[Brief description of the drawings]

【図１】図１は、本願発明の第１の実施の形態にかかる
方法を実現するためのＵＲＬ圧縮装置を、周囲の要素と
共に示すブロック図である。FIG. 1 is a block diagram showing a URL compression apparatus for implementing a method according to a first embodiment of the present invention, together with surrounding elements.

【図２】図２は、文字列リスト５０を模式的に示す図で
ある。FIG. 2 is a diagram schematically showing a character string list 50;

【図３】図３は、置換前後のＵＲＬを模式的に示す図で
ある。FIG. 3 is a diagram schematically showing URLs before and after replacement.

【図４】図４は、図５に示すコンピュータの外観図であ
る。FIG. 4 is an external view of the computer shown in FIG. 5;

【図５】図５は、本願発明の第１の実施の形態にかかる
方法を実現するためのコンピュータのブロック図であ
る。FIG. 5 is a block diagram of a computer for realizing the method according to the first embodiment of the present invention.

【図６】図６は、リスト作成処理５６の概略を示すフロ
ーチャートである。FIG. 6 is a flowchart illustrating an outline of a list creation process 56;

【図７】図７は、再計算前の集計表の例を模式的に示す
図である。FIG. 7 is a diagram schematically illustrating an example of an aggregation table before recalculation;

【図８】図８は、再計算後の集計表の例を模式的に示す
図である。FIG. 8 is a diagram schematically illustrating an example of a tabulation table after recalculation.

【図９】図９は、ＵＲＬの格納処理のフローチャートで
ある。FIG. 9 is a flowchart of a URL storage process.

【図１０】図１０は、ＵＲＬの検索処理のフローチャー
トである。FIG. 10 is a flowchart of a URL search process.

【図１１】図１１は、本願発明の第２の実施の形態にか
かる方法で使用される共通文字列置換表を模式的に示す
図である。FIG. 11 is a diagram schematically showing a common character string replacement table used in the method according to the second embodiment of the present invention.

【図１２】図１２は、本願発明の第２の実施の形態にか
かる方法で使用されるユーザ定義文字列置換表を模式的
に示す図である。FIG. 12 is a diagram schematically illustrating a user-defined character string replacement table used in the method according to the second embodiment of the present invention;

【図１３】図１３は、第２の実施の形態にかかる方法に
よる文字列置換前後のＵＲＬを模式的に示す図である。FIG. 13 is a diagram schematically illustrating URLs before and after character string replacement by the method according to the second embodiment;

【図１４】図１４は、本願発明の第３の実施の形態にか
かる方法で使用されるユーザ定義文字列置換表を模式的
に示す図である。FIG. 14 is a diagram schematically showing a user-defined character string replacement table used in the method according to the third embodiment of the present invention.

【図１５】図１５は、第３の実施の形態にかかる方法に
よる文字列置換前後のＵＲＬを模式的に示す図である。FIG. 15 is a diagram schematically illustrating URLs before and after character string replacement by a method according to a third embodiment;

[Explanation of symbols]

２０ブラウザ２２ＵＲＬ圧縮装置２４キャッシュファイル領域２６ハッシュメモリ４６アクセス履歴５０文字列リスト５２文字列置換処理５４ハッシュ計算処理５６リスト作成処理 Reference Signs List 20 browser 22 URL compression device 24 cache file area 26 hash memory 46 access history 50 character string list 52 character string replacement processing 54 hash calculation processing 56 list creation processing

Claims

[Claims]

1. A method for calculating a hash value for a character string to be processed in which a sequence of characters appears at a biased frequency, wherein a specific character string is uniquely associated with a converted character string having a shorter length. Preparing a machine-readable table of the following: using a computer, converting a character string appearing in the character string to be processed into a corresponding converted character string with reference to the table; Calculating a hash value based on the processing target character string including the converted character string.

2. The method according to claim 1, wherein the step of preparing the table includes a step of preparing, in a computer, a processing target character string that has appeared in the past, and the number of appearances of each of the partial character strings of the processing target character string that has appeared in the past. And summing the character string length using a computer.Based on the counted number of appearances and the character string length, replace with a predetermined character among the partial character strings of the processing target character string that appeared in the past. Selecting a partial character string capable of most efficiently compressing the processing target character string that appeared in the past and adding it to the table; and considering the selected partial character string, 2. The method for calculating a hash value for a character string according to claim 1, further comprising: recalculating the number of times, and repeating the adding step until a predetermined condition is satisfied.

3. A machine-readable recording medium recording a program for implementing a method of calculating a hash value for a character string to be processed in which a character sequence appears at a biased frequency, wherein the program comprises a specific character string Preparing a machine-readable table for uniquely associating a converted character string with a shorter-length converted character string; and converting a character string appearing in the processing target character string into a corresponding converted character string by referring to the table. A machine-readable record that records a program that implements a method of calculating a hash value for a character string, the method including a step of converting the character string into a string, and a step of calculating a hash value based on the processing target character string including the converted character string. Medium.

4. The step of preparing the table includes the steps of: preparing a processing target character string that has appeared in the past; and the number of appearances and the character string of each of the partial character strings of the processing target character string that has appeared in the past. Summing the length, and, based on the counted number of appearances and the character string length, among the partial character strings of the processing target character string that appeared in the past, when the character string was replaced with a predetermined character, the character string appeared in the past. Selecting a partial character string capable of most efficiently compressing the processing target character string and adding it to the table; recalculating the number of appearances in consideration of the selected partial character string; And a step of repeating the adding step until a predetermined condition is satisfied. The machine-readable program storing a program for implementing a hash value calculation method for a character string according to claim 3. Recording media.