JP5595957B2

JP5595957B2 - Access log processing system and method, program, and access log storage / retrieval device

Info

Publication number: JP5595957B2
Application number: JP2011068240A
Authority: JP
Inventors: 崇行中馬
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2011-03-25
Filing date: 2011-03-25
Publication date: 2014-09-24
Anticipated expiration: 2031-03-25
Also published as: JP2012203685A

Description

本発明は、アクセスログ処理システム及び方法及びプログラム、アクセスログ格納検索装置に係る。本発明は、特に、アクセスログ検索の特性を利用した高速検索が可能な圧縮格納技術と一意の文字列（利用者を一意に特定するデータ）を含むアクセスログを高速に検索するためのアクセスログ格納・検索技術に関する。 The present invention relates to an access log processing system, method and program, and an access log storage / retrieval apparatus. In particular, the present invention provides an access log for high-speed search of an access log including a compression storage technique capable of high-speed search using the characteristics of access log search and a unique character string (data that uniquely identifies a user). It relates to storage and retrieval technology.

近年、Ｗｅｂアクセスの増加にともないＷｅｂアクセスに関する調査とＷｅｂアクセスログも増加している。Ｗｅｂアクセスに関する調査を実施する場合、問合わせた利用者の識別情報をキーにして蓄積したＷｅｂアクセスログの検索を行い該当するＷｅｂアクセスログを抽出する。利用者からの問い合わせで必要なＷｅｂアクセスログには利用者の識別情報が含まれていることとＷｅｂアクセスの特徴としてＵＲＬデータはホストデータ等、同じ文字列のデータが連続して続くことがあげられる。
特許文献１には、ファイルサーバ上の同一データ内容のファイルについては、同じ実体ファイルにリンクさせたエントリを登録し、管理テーブル上は論理的に別ファイルとして扱うことで、重複データ分のデータサイズ節約を実現するようにした情報処理装置が記載されている。 In recent years, with the increase in Web access, surveys and Web access logs related to Web access are also increasing. When conducting a survey on web access, the web access log stored using the identification information of the inquired user as a key is searched to extract the corresponding web access log. The web access log required for the inquiry from the user includes the identification information of the user, and as a feature of the web access, the URL data is continuously hosted data such as host data. It is done.
Patent Document 1 registers entries linked to the same entity file for files with the same data content on a file server, and treats them as logically separate files on the management table. An information processing apparatus that realizes saving is described.

特開２００９−２３７９７９号公報JP 2009-237799 A

従来の技術は、主に、汎用的なファイル圧縮技術、あるいはデータベース格納技術によるものである。特定利用者のＷｅｂアクセスに対応するＷｅｂアクセスログを検索する場合、これらの従来技術を用いると、ひとつにはＷｅｂアクセスログに特化した格納方式でないため、圧縮効率が十分でなかったり、圧縮格納による検索処理時間が低下する場合があるという課題があった。
また、従来の技術では、Ｗｅｂアクセスログのデータの格納サイズを効率化するため、圧縮して格納する場合があるが、検索処理の際、Ｗｅｂアクセスログの復元処理に時間がかかり検索応答時間が遅くなる場合があるという課題があった。

本発明は、以上の点に鑑み、位置情報が記録された検索用メタデータを作成し、アクセスログのデータを圧縮して格納することにより、アクセスログを高速検索することを目的とする。 Conventional techniques are mainly based on general-purpose file compression techniques or database storage techniques. When searching for a Web access log corresponding to a specific user's Web access, using these conventional technologies, one of the storage methods is not specialized for the Web access log. There is a problem that the search processing time due to may decrease.
In addition, in the conventional technology, in order to improve the storage size of the Web access log data, the data may be compressed and stored. However, in the search process, it takes time to restore the Web access log, and the search response time is long. There was a problem that it might be slow.

In view of the above, an object of the present invention is to search for access logs at high speed by creating search metadata in which position information is recorded and compressing and storing access log data.

本発明では、上記課題を解決するため、各利用者毎でまとめたＷｅｂ等のアクセスログは類似した連続データが続くことと、利用者の問合せは利用者のアクセスログを抽出することで確認できることから、Ｗｅｂ等のアクセスログ検索業務に特化したデータ格納方式とすることで効率的な圧縮と検索時のＨＤＤ等のメモリアクセスを小さくした高速な検索処理を提供する。また、本発明では、利用者毎にＷｅｂ等のアクセスログを固めて格納することで、一意の利用者のＷｅｂアクセスログの抽出の際、メタデータの読出しと該当Ｗｅｂ等のアクセスログのディスクアクセス回数を削減することが可能となる。
本発明のひとつの態様によると、Ｗｅｂアクセスログ格納検索システムは、Ｗｅｂアクセスログ格納機能部、Ｗｅｂアクセスログ格納ＨＤＤ、Ｗｅｂアクセスログ検索機能部を備える。Ｗｅｂアクセスログ格納機能部は、Ｗｅｂアクセスログ検索の検索特性を考慮したＷｅｂアクセスログの加工・圧縮とメタデータ作成を行い、Ｗｅｂアクセスログ格納ＨＤＤに格納する。Ｗｅｂアクセスログ検索機能部は、Ｗｅｂアクセスログ格納ＨＤＤからメタデータ読出しを行い利用者情報に基づいた箇所の呼び出しを行う。検索結果は通常のＷｅｂアクセスログ検索結果に加えて、差分のみのデータ出力を行うこともできる。 In the present invention, in order to solve the above-mentioned problem, similar continuous data continues in the access log such as Web collected for each user, and the user's inquiry can be confirmed by extracting the access log of the user. Thus, a high-speed search process is provided in which the data storage method specialized for the access log search work such as the Web is used and the compression and the memory access such as the HDD at the time of the search are reduced. In the present invention, the access log of the Web or the like is stored and stored for each user, so that when the Web access log of a unique user is extracted, the metadata is read and the disk access of the access log of the corresponding Web or the like is performed. The number of times can be reduced.
According to one aspect of the present invention, a Web access log storage / retrieval system includes a Web access log storage function unit, a Web access log storage HDD, and a Web access log search function unit. The Web access log storage function unit processes and compresses the Web access log and creates metadata in consideration of the search characteristics of Web access log search, and stores it in the Web access log storage HDD. The Web access log search function unit reads metadata from the Web access log storage HDD and calls a location based on user information. In addition to the normal Web access log search result, only the difference data can be output as the search result.

本発明の第１の解決手段によると、
ウェブ又はコンテンツ又は他の情報を提供する複数のサーバと、
前記サーバにアクセスする利用者の端末と、
利用者から前記サーバへのアクセスを記録したアクセスログを処理するアクセスログ格納検索装置と
を備えたアクセスログ処理システムにおいて、
前記アクセスログ格納検索装置は、
利用者識別情報に対応して、時刻と、ウェブ又はコンテンツ又は他の情報のアクセス識別情報とを含むアクセスログを記憶する、利用者識別情報のひとつ又は複数の頭文字で分割された複数のアクセスログファイルと、
利用者識別情報に対応して、アクセスログファイル名と、アクセスログのデータ格納位置情報とを含むメタデータを記憶するメタデータファイルと、
を備えたアクセスログ処理システム、前記アクセスログ処理システムにおけるアクセスログ処理方法であって、

前記アクセスログ格納検索装置は、
各利用者からのアクセスにより各前記サーバが前記端末へ出力した情報についてのアクセスログを収集し、
複数のアクセスログ間において、利用者識別情報及び時刻及びアクセスアクセス識別情報の各々に重複データが存在する場合は、各々の前記重複データを削除して、利用者識別情報毎のひとつ又は複数のアクセスログデータを作成し、
利用者識別情報毎のひとつ又は複数のアクセスログデータに基づき、利用者識別情報毎の、アクセスログファイル名及びデータ格納位置情報を含むメタデータを作成し、
利用者識別情報毎のひとつ又は複数のアクセスログデータを、利用者識別情報の前記頭文字毎の前記アクセスログファイルに記憶し、利用者識別情報毎のメタデータを前記メタデータファイルに記憶し、
及び、
利用者識別情報を指定した検索処理要求を受けると、受けた検索処理要求に含まれる利用者識別情報に基づき、前記メタデータファイルを検索して、メタデータのアクセスログファイル名及びデータ格納位置情報を抽出し、
抽出したアクセスログファイル名のアクセスログファイルから、抽出したデータ格納位置情報に従いアクセスログを読込み、
削除された重複データの復元を行いアクセスログを作成し、又は、削除された重複データを復元しないでブランク又は重複を示す記号又は文字を記載したアクセスログを作成し、作成された該アクセスログを表示部に表示する又は記憶部に記憶する
アクセスログ処理システム、アクセスログ処理方法が提供される。 According to the first solution of the present invention,
Multiple servers providing web or content or other information;
A terminal of a user accessing the server;
In an access log processing system comprising an access log storage / retrieval device for processing an access log recording access to the server from a user,
The access log storage / retrieval device includes:
Corresponding to the user identification information, a plurality of accesses divided by one or more acronyms of the user identification information for storing an access log including time and access identification information of the web or content or other information Log files and
Corresponding to the user identification information, a metadata file that stores metadata including an access log file name and access log data storage location information;
An access log processing system, and an access log processing method in the access log processing system,

The access log storage / retrieval device includes:
Collecting access logs about information output to the terminal by each server by access from each user,
If duplicate data exists in each of the user identification information, time, and access access identification information among a plurality of access logs, the duplicate data is deleted, and one or more accesses for each user identification information Create log data,
Based on one or more access log data for each user identification information, create metadata including access log file name and data storage location information for each user identification information,
Storing one or more access log data for each user identification information in the access log file for each initial of the user identification information, storing metadata for each user identification information in the metadata file,
as well as,
When a search processing request specifying user identification information is received, the metadata file is searched based on the user identification information included in the received search processing request, and the access log file name and data storage location information of the metadata Extract
Read the access log from the access log file of the extracted access log file name according to the extracted data storage location information,
Restore the deleted duplicate data and create an access log, or create an access log with blanks or symbols or characters indicating duplication without restoring the deleted duplicate data. An access log processing system and an access log processing method that display on a display unit or store in a storage unit are provided.

本発明の第２の解決手段によると、
ウェブ又はコンテンツ又は他の情報を提供する複数のサーバと、
前記サーバにアクセスする利用者の端末と、
利用者から前記サーバへのアクセスを記録したアクセスログを処理するアクセスログ格納検索装置と、
利用者識別情報に対応して、時刻と、ウェブ又はコンテンツ又は他の情報のアクセス識別情報とを含むアクセスログを記憶する、利用者識別情報のひとつ又は複数の頭文字で分割された複数のアクセスログファイルと、
利用者識別情報に対応して、アクセスログファイル名と、アクセスログのデータ格納位置情報とを含むメタデータを記憶するメタデータファイルと、
を備えたアクセスログ処理システムにおけるアクセスログ処理プログラムであって、
各利用者からのアクセスにより各前記サーバが前記端末へ出力した情報についてのアクセスログを収集する手順と、
複数のアクセスログ間において、利用者識別情報及び時刻及びアクセスアクセス識別情報の各々に重複データが存在する場合は、各々の前記重複データを削除して、利用者識別情報毎のひとつ又は複数のアクセスログデータを作成する手順と、
利用者識別情報毎のひとつ又は複数のアクセスログデータに基づき、利用者識別情報毎の、アクセスログファイル名及びデータ格納位置情報を含むメタデータを作成する手順と、
利用者識別情報毎のひとつ又は複数のアクセスログデータを、利用者識別情報の前記頭文字毎の前記アクセスログファイルに記憶し、利用者識別情報毎のメタデータを前記メタデータファイルに記憶する手順と、
及び、
利用者識別情報を指定した検索処理要求を受けると、受けた検索処理要求に含まれる利用者識別情報に基づき、前記メタデータファイルを検索して、メタデータのアクセスログファイル名及びデータ格納位置情報を抽出する手順と、
抽出したアクセスログファイル名のアクセスログファイルから、抽出したデータ格納位置情報に従いアクセスログを読込む手順と、
削除された重複データの復元を行いアクセスログを作成し、又は、削除された重複データを復元しないでブランク又は重複を示す記号又は文字を記載したアクセスログを作成し、作成された該アクセスログを表示部に表示する又は記憶部に記憶するする手順と
を前記アクセスログ格納検索装置に実行させるためのアクセスログ処理プログラムが提供される。 According to the second solution of the present invention,
Multiple servers providing web or content or other information;
A terminal of a user accessing the server;
An access log storage / retrieval device for processing an access log recording access to the server from a user;
Corresponding to the user identification information, a plurality of accesses divided by one or more acronyms of the user identification information for storing an access log including time and access identification information of the web or content or other information Log files,
Corresponding to the user identification information, a metadata file that stores metadata including an access log file name and access log data storage location information;
An access log processing program in an access log processing system comprising:
A procedure of collecting an access log about information output to the terminal by each server by access from each user;
If duplicate data exists in each of the user identification information, time, and access access identification information among a plurality of access logs, the duplicate data is deleted, and one or more accesses for each user identification information The procedure to create log data,
A procedure for creating metadata including an access log file name and data storage location information for each user identification information based on one or a plurality of access log data for each user identification information;
Procedure for storing one or a plurality of access log data for each user identification information in the access log file for each initial of the user identification information, and storing metadata for each user identification information in the metadata file When,
as well as,
When a search processing request specifying user identification information is received, the metadata file is searched based on the user identification information included in the received search processing request, and the access log file name and data storage location information of the metadata Steps to extract
The procedure for reading the access log from the access log file with the extracted access log file name according to the extracted data storage location information,
Restore the deleted duplicate data and create an access log, or create an access log with blanks or symbols or characters indicating duplication without restoring the deleted duplicate data. There is provided an access log processing program for causing the access log storage / retrieval apparatus to execute a procedure for displaying on a display unit or storing in a storage unit.

本発明の第３の解決手段によると、
ウェブ又はコンテンツ又は他の情報を提供する複数のサーバへの、利用者からのアクセスを記録したアクセスログを処理するアクセスログ格納検索装置であって、

利用者識別情報に対応して、時刻と、ウェブ又はコンテンツ又は他の情報のアクセス識別情報とを含むアクセスログを記憶する、利用者識別情報のひとつ又は複数の頭文字で分割された複数のアクセスログファイルと、
利用者識別情報に対応して、アクセスログファイル名と、アクセスログのデータ格納位置情報とを含むメタデータを記憶するメタデータファイルと、
アクセスログを処理して、アクセスログファイル及びメタデータファイルにそれぞれアクセスログ及びメタデータを格納するためのアクセスログ格納機能部と、
アクセスログ検索依頼に従い、アクセスログファイル及びメタデータファイルから、それぞれアクセスログ及びメタデータを検索するためのアクセスログ検索機能部と
を備え、
前記アクセスログ格納機能部は、
各利用者からのアクセスにより各前記サーバが前記端末へ出力した情報についてのアクセスログを収集し、
複数のアクセスログ間において、利用者識別情報及び時刻及びアクセスアクセス識別情報の各々に重複データが存在する場合は、各々の前記重複データを削除して、利用者識別情報毎のひとつ又は複数のアクセスログデータを作成し、
利用者識別情報毎のひとつ又は複数のアクセスログデータに基づき、利用者識別情報毎の、アクセスログファイル名及びデータ格納位置情報を含むメタデータを作成し、
利用者識別情報毎のひとつ又は複数のアクセスログデータを、利用者識別情報の前記頭文字毎の前記アクセスログファイルに記憶し、利用者識別情報毎のメタデータを前記メタデータファイルに記憶し、
及び、
前記アクセスログ検索機能部は、
利用者識別情報を指定した検索処理要求を受けると、受けた検索処理要求に含まれる利用者識別情報に基づき、前記メタデータファイルを検索して、メタデータのアクセスログファイル名及びデータ格納位置情報を抽出し、
抽出したアクセスログファイル名のアクセスログファイルから、抽出したデータ格納位置情報に従いアクセスログを読込み、
削除された重複データの復元を行いアクセスログを作成し、又は、削除された重複データを復元しないでブランク又は重複を示す記号又は文字を記載したアクセスログを作成し、作成された該アクセスログを表示部に表示する又は記憶部に記憶する
アクセスログ格納検索装置が提供される。
According to the third solution of the present invention,
An access log storage and retrieval device for processing an access log recording access from a user to a plurality of servers that provide web or content or other information,

Corresponding to the user identification information, a plurality of accesses divided by one or more acronyms of the user identification information for storing an access log including time and access identification information of the web or content or other information Log files,
Corresponding to the user identification information, a metadata file that stores metadata including an access log file name and access log data storage location information;
An access log storage function unit for processing the access log and storing the access log and the metadata in the access log file and the metadata file, respectively;
In accordance with the access log search request, the access log search function unit for searching the access log and metadata from the access log file and the metadata file, respectively,
The access log storage function unit
Collecting access logs about information output to the terminal by each server by access from each user,
If duplicate data exists in each of the user identification information, time, and access access identification information among a plurality of access logs, the duplicate data is deleted, and one or more accesses for each user identification information Create log data,
Based on one or more access log data for each user identification information, create metadata including access log file name and data storage location information for each user identification information,
Storing one or more access log data for each user identification information in the access log file for each initial of the user identification information, storing metadata for each user identification information in the metadata file,
as well as,
The access log search function unit
When a search processing request specifying user identification information is received, the metadata file is searched based on the user identification information included in the received search processing request, and the access log file name and data storage location information of the metadata Extract
Read the access log from the access log file of the extracted access log file name according to the extracted data storage location information,
Restore the deleted duplicate data and create an access log, or create an access log with blanks or symbols or characters indicating duplication without restoring the deleted duplicate data. An access log storage / retrieval device that displays on a display unit or stores in a storage unit is provided.

本発明によれば、位置情報が記録された検索用メタデータを作成し、アクセスログのデータを圧縮して格納することにより、アクセスログを高速検索することができる。
また、本発明によれば、位置情報が記録されたメタデータを作成することと、位置情報に基づき該当データをファイル又はディスク等から読み込むことにより、すべてのデータを都度読み込んで出力する方式と比較した場合において、高速に検索結果を取得することが可能となる。また、本発明によれば、特にＷｅｂアクセスのように同じデータが連続する特性があるようなアクセスについて、差分のみデータを格納することでＨＤＤへのアクセスログデータの効率格納と呼び出しを図ることができる。
According to the present invention, it is possible to search an access log at high speed by creating search metadata in which position information is recorded and compressing and storing the data of the access log.
In addition, according to the present invention, comparison is made with a method in which all data is read and output each time by creating metadata in which position information is recorded and reading corresponding data from a file or a disk based on the position information. In such a case, it is possible to obtain the search result at high speed. In addition, according to the present invention, it is possible to efficiently store and call access log data to the HDD by storing only the difference data, particularly for accesses such as Web access in which the same data continues. it can.

本発明を適用したＷｅｂアクセスログ格納・検索システムの実態の形態のシステム構成図・ブロック図。The system configuration figure and block diagram of the actual form of the Web access log storage / retrieval system to which the present invention is applied. 本発明を適用したＷｅｂアクセスログ格納処理の実施の形態のシーケンス図（データ格納処理）。The sequence diagram (data storage process) of embodiment of the web access log storage process to which this invention is applied. 本発明を適用したＷｅｂアクセスログ検索処理の実施の形態のシーケンス図（データ検索処理）。The sequence diagram (data search process) of embodiment of the web access log search process to which this invention is applied. 図２のＷｅｂアクセスログ格納処理を適用したＷｅｂアクセスログ格納ＨＤＤ４５の説明図（データブロック図）。FIG. 3 is an explanatory diagram (data block diagram) of a Web access log storage HDD 45 to which the Web access log storage process of FIG. 2 is applied. 図４のＷｅｂアクセスログ検索処理を適用した検索結果画面（１）を示した図。The figure which showed the search result screen (1) to which the web access log search process of FIG. 4 was applied. 図４のＷｅｂアクセスログ検索処理を適用した検索結果画面（２）を示した図（ＣＳＶファイル）。The figure (CSV file) which showed the search result screen (2) to which the web access log search process of FIG. 4 was applied. Ｗｅｂアクセスログ重複データ削除処理３７についてのフローチャート。The flowchart about the Web access log duplication data deletion process 37. FIG. Ｗｅｂアクセスログメタデータ作成処理４１についてのフローチャート。The flowchart about the web access log metadata preparation process 41. FIG. Ｗｅｂサーバのステータスコード例を示す図。The figure which shows the example of the status code of a web server. メタデータファイルの説明図。Explanatory drawing of a metadata file. アクセスログファイルの説明図。Explanatory drawing of an access log file.

以下、本発明の実施の形態について図面により詳細に説明する。
図１は、本発明を適用したＷｅｂアクセスログ格納・検索システムの実態の形態のシステム構成図・ブロック図を示したものである。図１において、本システムは、利用者の端末１、利用者の端末１からアクセスされるＷｅｂサーバ２、Ｗｅｂアクセスログ格納・検索装置１００、Ｗｅｂアクセスログ検索者の端末６、端末１とＷｅｂアクセスログ格納・検索装置１００とを接続するインターネット等のネットワークを備える。なおＷｅｂアクセスログ検索者の端末６もインターネット等のネットワークを介してＷｅｂアクセスログ格納・検索装置１００と接続されるようにしてもよい。
Ｗｅｂアクセスログ格納・検索装置１００は、Ｗｅｂアクセスログ格納機能部３、Ｗｅｂアクセスログ格納ＨＤＤ４、Ｗｅｂアクセスログ検索機能部５を有する。Ｗｅｂアクセスログ格納機能３は、利用者からのアクセスを記録したＷｅｂアクセスログを処理してＷｅｂアクセスログ格納ＨＤＤ４に格納を行う。Ｗｅｂアクセスログ検索機能部５は、Ｗｅｂアクセスログ検索者６からの利用者単位でのＷｅｂアクセスログ検索依頼を受け、Ｗｅｂアクセスログ格納ＨＤＤ４からＷｅｂアクセスログ格納機能部３で、メタデータファイル１５及びＷｅｂアクセスログファイル１６、１７にそれぞれ格納されたメタデータ及びＷｅｂアクセスログを読み出して検索処理を行う。Ｗｅｂアクセスログ格納機能部３は、Ｗｅｂアクセスログ収集処理７、Ｗｅｂアクセスログキー単位で分割処理８、Ｗｅｂアクセスログキーデータでマージ処理９、Ｗｅｂアクセスログソート処理１０、Ｗｅｂアクセスログ重複データ削除処理１１、Ｗｅｂアクセスログキー単位で圧縮処理１２、Ｗｅｂアクセスログメタデータ作成処理１３、ＷｅｂアクセスログＨＤＤ格納処理１４を含む。Ｗｅｂアクセスログ格納ＨＤＤ４は、Ｗｅｂアクセスログ格納機能部３で生成したメタデータファイル１５、Ｗｅｂアクセスログファイル１６、１７を備える。なお、Ｗｅｂアクセスログを記録する装置としては、ＨＤＤに限らず、フラッシュメモリ、ＳＤ等の適宜のメモリデバイスを用いることができる。Ｗｅｂアクセスログ検索機能５は、Ｗｅｂアクセスログ検索受付処理１８、検索キーのメタデータ読込処理１９、該当Ｗｅｂアクセスログの読込箇所特定処理２０、Ｗｅｂアクセスログの該当箇所読込処理２１、該当データの解凍処理２２、重複データ復元処理２３、検索結果応答処理２４を含む。
また、Ｗｅｂアクセスログ格納・検索装置１００及び／又は検索者端末６は、Ｗｅｂアクセスログ検索機能５による検索結果を表示するための表示部を備える。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a system configuration diagram / block diagram showing the actual form of a Web access log storage / retrieval system to which the present invention is applied. 1, this system includes a user terminal 1, a web server 2 accessed from the user terminal 1, a web access log storage / retrieval device 100, a web access log searcher terminal 6, a terminal 1 and web access. A network such as the Internet for connecting the log storage / retrieval device 100 is provided. Note that the terminal 6 of the Web access log searcher may also be connected to the Web access log storage / search apparatus 100 via a network such as the Internet.
The web access log storage / retrieval apparatus 100 includes a web access log storage function unit 3, a web access log storage HDD 4, and a web access log search function unit 5. The web access log storage function 3 processes a web access log recording access from a user and stores it in the web access log storage HDD 4. The Web access log search function unit 5 receives a Web access log search request from the Web access log searcher 6 for each user, and receives the metadata file 15 and the Web access log storage function unit 3 from the Web access log storage HDD 4. The metadata and the web access log respectively stored in the web access log files 16 and 17 are read and search processing is performed. The web access log storage function unit 3 includes a web access log collection process 7, a division process 8 for each web access log key, a merge process 9 using web access log key data, a web access log sort process 10, and a web access log duplicate data deletion process. 11, compression processing 12 for each Web access log key, Web access log metadata creation processing 13, and Web access log HDD storage processing 14. The Web access log storage HDD 4 includes a metadata file 15 and Web access log files 16 and 17 generated by the Web access log storage function unit 3. The apparatus for recording the Web access log is not limited to the HDD, and an appropriate memory device such as a flash memory or SD can be used. The Web access log search function 5 includes a Web access log search reception process 18, a search key metadata reading process 19, a corresponding Web access log reading part specifying process 20, a Web access log corresponding part reading process 21, and a corresponding data decompression process. Processing 22, duplicate data restoration processing 23, and search result response processing 24 are included.
Further, the Web access log storage / retrieval device 100 and / or the searcher terminal 6 includes a display unit for displaying a search result by the Web access log search function 5.

図１０に、メタデータファイルの説明図を示す。
メタデータファイルに記憶されるメタデータは、利用者識別情報に対応して、アクセスログデータが格納されるアクセスログファイルのアクセスログファイル名と、該当利用者のデータ開始位置及びデータサイズで表されるアクセスログのデータ格納位置とを含む。利用者識別情報は、例えば、電話番号、メールアドレス、ログインＩＤ、利用者ＩＤ等、利用者を一意に識別することができる文字列等の情報である。アクセスログのデータ格納位置は、この例の他にも、データ格納開始位置及びデータ格納終了位置等の適宜の位置情報としてもよい。 FIG. 10 is an explanatory diagram of the metadata file.
The metadata stored in the metadata file is represented by the access log file name of the access log file storing the access log data, the data start position and the data size of the corresponding user, corresponding to the user identification information. Access log data storage location. The user identification information is information such as a character string that can uniquely identify the user, such as a telephone number, an e-mail address, a login ID, and a user ID. In addition to this example, the data storage position of the access log may be appropriate position information such as a data storage start position and a data storage end position.

図１１に、アクセスログファイルの説明図を示す。
アクセスログファイルは、利用者識別情報の頭文字毎に複数備えられ、アクセスログファイル名が与えられる。なお、単位として使用する「頭文字」は、英語・日本語や他の言語や記号でもよいし、また、ひとつの頭文字（例、「ａ」、「ｂ」・・、又は、「あ」「い」・・・等）でも、又は、複数の頭文字（例、「ａａ」、「ａｂ」・・、又は、「ああ」「あい」・・・等）でもよい。アクセスログは、利用者識別情報に対応して、時刻と、ＵＲＬ、利用者ＩＰアドレス、ステータスコードを含む。ＵＲＬは、これに限らず、ウェブ又はコンテンツ又は他の情報である適宜のアクセス識別情報とすることができる。利用者ＩＰアドレスは、利用者の端末のアドレスとすることができる。なお、アクセスログファイルは、Ｗｅｂアクセスログ格納ＨＤＤ４以外にも、図２等で記載されたような、Ｗｅｂアクセスログ格納・検索装置１００内の適宜一時記憶領域でも、同様のフォーマットである。 FIG. 11 is an explanatory diagram of the access log file.
A plurality of access log files are provided for each initial of the user identification information, and an access log file name is given. The “initial” used as a unit may be English, Japanese, other languages or symbols, or a single initial (eg, “a”, “b”... "I" ... etc.), or a plurality of initials (e.g., "aa", "ab"... The access log includes time, URL, user IP address, and status code corresponding to the user identification information. The URL is not limited to this, and may be appropriate access identification information that is web, content, or other information. The user IP address can be the address of the user's terminal. The access log file has the same format in the appropriate temporary storage area in the Web access log storage / retrieval apparatus 100 as described in FIG. 2 and the like in addition to the Web access log storage HDD 4.

図９は、Ｗｅｂサーバのステータスコード例を示す図である。
図示のように、ステータスコードにより、アクセス状況が示される。 FIG. 9 is a diagram illustrating an example of a status code of the Web server.
As shown in the figure, the status is indicated by the status code.

図２は、本発明を適用したＷｅｂアクセスログ格納処理の実施の形態のシーケンス図を示したものである。この処理は、Ｗｅｂアクセスログ格納機能部３が実行する。
図２において、Ｗｅｂアクセスログ格納機能部３は、「Ｗｅｂアクセスログ収集処理２９（７）」（カッコ内の参照番号は図１に対応）で、各利用者からのＷｅｂアクセスにより各Ｗｅｂサーバが出力したＷｅｂアクセスログ２８の収集を行う。つぎに、Ｗｅｂアクセスログ格納機能部３は、「Ｗｅｂアクセスログキー単位で分割処理３１（８）」で、Ｗｅｂアクセスログ３０に含まれている利用者識別情報（以後、キー）単位毎に各Ｗｅｂサーバから収集したＷｅｂアクセスログ３０をファイル分割する。例えば、利用者「ａｂ１０２」「ａｂ１０５」「ｃｙ２９２」「ｃｙ３５３」からＷｅｂアクセスがあった際には各ＷｅｂサーバのＷｅｂアクセスログ毎にキーの頭文字ａのファイル、ｃのファイルといったファイルに各Ｗｅｂサーバのログの分割処理を行う。つぎに、Ｗｅｂアクセスログ格納機能部３は、「Ｗｅｂアクセスログキーデータでマージ処理３３（９）」で、各Ｗｅｂサーバのログ毎に分割したＷｅｂアクセスログ（キー単位でファイル分割）３２をキー単位にマージを行う。例えば、Ｗｅｂサーバ１でキーの頭文字「ａ」のファイル、「ｃ」のファイルとＷｅｂサーバ２でキーの頭文字「ａ」のファイル、「ｃ」のファイルの４ファイルがある場合、Ｗｅｂサーバ１と２のファイルを頭文字「ａ」、頭文字「ｂ」の２ファイルにマージする処理を行う。Ｗｅｂアクセスログ格納機能部３は、「Ｗｅｂアクセスログソート処理３５（１０）」で、Ｗｅｂアクセスログ（マージ済）３４をキー単位でソートする。例えば、キーの頭文字「ａ」のファイルに利用者「ａｂ１０２」「ａｂ１０５」のＷｅｂアクセスログが含まれている場合、キー単位でソートすることで「ａｂ１０２」のＷｅｂアクセスログの後、「ａｂ１０５」のＷｅｂアクセスログの順番となる。 FIG. 2 shows a sequence diagram of an embodiment of a Web access log storage process to which the present invention is applied. This process is executed by the Web access log storage function unit 3.
In FIG. 2, the Web access log storage function unit 3 is “Web access log collection process 29 (7)” (reference numbers in parentheses correspond to those in FIG. 1), and each Web server receives a Web access from each user. The output web access log 28 is collected. Next, the Web access log storage function unit 3 performs “divide processing 31 (8) by Web access log key unit” for each user identification information (hereinafter, key) unit included in the Web access log 30. The web access log 30 collected from the web server is divided into files. For example, when there is a web access from the user “ab102”, “ab105”, “cy292”, “cy353”, each web server logs each file in a file such as a file with an initial letter “a” or “c” for each web access log. Divide the server log. Next, the Web access log storage function unit 3 uses the “Web access log key data merge process 33 (9)” to key the Web access log (file division in units of keys) 32 divided for each Web server log. Merge into units. For example, if the Web server 1 has four files, the file with the initial letter “a”, the file “c”, the file with the initial letter “a”, and the file “c” in the Web server 2, The process of merging the files 1 and 2 into the two files with the initial “a” and the initial “b” is performed. The Web access log storage function unit 3 sorts the Web access log (merged) 34 in units of keys by “Web access log sorting process 35 (10)”. For example, if the file with the initial “a” of the key includes the web access logs of the users “ab102” and “ab105”, the “ab105” is sorted after the web access log of “ab102” by sorting by the key unit. "Is the order of the web access log.

つぎに、Ｗｅｂアクセスログ格納機能部３は、「Ｗｅｂアクセスログ重複データ削除処理３７（１１）」で、Ｗｅｂアクセスログ（ＣＳＶファイル形式）の各カラムに重複したデータが存在する場合は重複していると判断して重複データの削除を行う。例えば、利用者「ａｂ１０２」のＷｅｂアクセスログが４件あった場合、利用者識別情報（キー）は重複しているので２行目以下の「ａｂ１０２」データを削除する。時刻についても、同一時刻でのアクセスの場合は、同一時刻の時刻情報を削除する。ＵＲＬについては、例えば、ｓｃｈｅｍａ，ａｕｔｈｏｒｉｔｙ，ｐａｔｈに分割して重複しているデータは削除を行い、差分箇所（後方一致）のみのデータに書き換えを行う。利用者ＩＰアドレス、ステータスコードのカラムについても、重複データが続く場合は削除を行う。つぎに、Ｗｅｂアクセスログ格納機能部３は、「Ｗｅｂアクセスログキー単位で圧縮処理３９（１２）」で、利用者識別情報単位に重複データを削除したデータの圧縮（ＺＩＰ圧縮等）を行いエンコード処理（ＢＡＳＥ６４等）を行う。つぎに、Ｗｅｂアクセスログ格納機能部３は、「Ｗｅｂアクセスログメタデータ作成処理４１（１３）」で、キー単位で圧縮したデータの各キーの値、格納しているＷｅｂアクセスログファイル名、圧縮データの開始位置、圧縮データのサイズをセットにしたメタデータを作成する。例えば、アクセスログファイル（キーの頭が「ａ」）に利用者「ａｂ１０２」のデータが記録されている場合は、メタデータにキー値「ａｂ１０２」、対象ファイル名「アクセスログファイル（キーの頭文字が「ａ」の「ｆｉｌｅ−ａ」）」、開始位置に利用者「ａｂ１０２」のＷｅｂアクセスログデータ（圧縮済）の開始位置、サイズに利用者「ａｂ１０２」のＷｅｂアクセスログデータ（圧縮済）のサイズをそれぞれ追記する。つぎに、Ｗｅｂアクセスログ格納機能部３は、「ＷｅｂアクセスログＨＤＤ格納処理４４（１４）」で、「Ｗｅｂアクセスログキー単位で圧縮処理３９（１２）」で作成したＷｅｂアクセスログ（キー単位圧縮）４０と「Ｗｅｂアクセスログメタデータ作成処理４１（１３）」で作成したメタデータを、Ｗｅｂアクセスログ格納ＨＤＤ４のアクセスログファイルとメタデータファイルにそれぞれ格納する。
なお、「Ｗｅｂアクセスログ収集処理２９（７）」から「Ｗｅｂアクセスログメタデータ作成処理４１（１３）」の各処理において、Ｗｅｂアクセスログ格納ＨＤＤ４内のアクセスログファイル及びメタファイルにアクセスログ及びメタデータを、それぞれ直接記憶するようにしてもよい。その場合は、「ＷｅｂアクセスログＨＤＤ格納処理４４（１４）」を省略することができる。 Next, in the “Web access log duplicate data deletion process 37 (11)”, the Web access log storage function unit 3 duplicates if there is duplicate data in each column of the Web access log (CSV file format). Delete duplicate data. For example, if there are four Web access logs for the user “ab102”, the user identification information (key) is duplicated, so the “ab102” data in the second and subsequent rows is deleted. As for the time, when accessing at the same time, the time information at the same time is deleted. As for the URL, for example, data that is divided and overlapped into schema, authority, and path is deleted, and data is rewritten to data of only the difference portion (backward match). The user IP address and status code columns are also deleted if duplicate data continues. Next, the Web access log storage function unit 3 performs compression (ZIP compression etc.) by compressing data (ZIP compression or the like) from which duplicate data is deleted in units of user identification information in “Compression process 39 (12) in Web access log key units”. Processing (BASE64 etc.) is performed. Next, the Web access log storage function unit 3 performs the “Web access log metadata creation process 41 (13)”, the value of each key of the data compressed in units of keys, the name of the stored Web access log file, the compression Create metadata with the data start position and compressed data size as a set. For example, when the data of the user “ab102” is recorded in the access log file (the key starts with “a”), the key value “ab102” and the target file name “access log file (the key start”) are stored in the metadata. “File-a”) with the character “a”), the start position of the web access log data (compressed) of the user “ab102” at the start position, and the web access log data of the user “ab102” (compressed) at the size ) Size is added. Next, the Web access log storage function unit 3 uses the “Web access log HDD storage process 44 (14)” to generate the Web access log (key unit compression) generated by the “Web access log key unit compression process 39 (12)”. ) 40 and the metadata created by the “Web access log metadata creation process 41 (13)” are stored in the access log file and the metadata file of the web access log storage HDD 4 respectively.
In each process from “Web access log collection process 29 (7)” to “Web access log metadata creation process 41 (13)”, an access log and a meta file in the access log file and meta file in the Web access log storage HDD 4 are stored. The data may be directly stored. In this case, the “Web access log HDD storage process 44 (14)” can be omitted.

図４は、図２のＷｅｂアクセスログ格納処理を適用したＷｅｂアクセスログ格納ＨＤＤ４５の説明図を示したものである。
メタデータ６８は利用者単位でのデータ格納位置を示す利用者を識別する情報、対象ファイル名、データ格納開始位置、データ格納サイズを記録している。アクセスログファイル６９、７４は利用者識別情報単位で分割したファイルである。この例では、利用者識別情報（キー）が「ａｂ１０２」の場合、メタデータ６８には対象ファイル名「ｆｉｌｅ−ａ」、開始位置「０」、サイズ「５２０」が格納されている。対象ファイル「ｆｉｌｅ−ａ」６９は利用者識別情報（キー）が「ａ」で始まるＷｅｂアクセスログファイルで、開始位置「０」からサイズ「５２０」を読み出すことで利用者識別情報「ａｂ１０２」のＷｅｂアクセスログ７０を抽出することができる。解凍するとＷｅｂアクセスログ（重複データ削除済）データ７２を取得することができる。利用者識別情報「ａｂ１０２」のＷｅｂアクセスログが４件あった場合、利用者識別情報は重複しているので２件目以降は削除している。時間も同じ場合は同様に削除している。ＵＲＬは、重複するようｓｃｈｅｍａ，ａｕｔｈｏｒｉｔｙ，ｐａｔｈに分解して重複しているデータは同様に削除している。一部のみ異なる場合は変更箇所（後方一致で）のみデータを格納し、それ以外の値は削除している。 FIG. 4 is an explanatory diagram of the Web access log storage HDD 45 to which the Web access log storage process of FIG. 2 is applied.
The metadata 68 records information for identifying a user indicating a data storage position in units of users, a target file name, a data storage start position, and a data storage size. Access log files 69 and 74 are files divided in units of user identification information. In this example, when the user identification information (key) is “ab102”, the metadata 68 stores the target file name “file-a”, the start position “0”, and the size “520”. The target file “file-a” 69 is a Web access log file whose user identification information (key) starts with “a”. By reading the size “520” from the start position “0”, the user identification information “ab102” is stored. The web access log 70 can be extracted. When decompressed, Web access log (duplicate data deleted) data 72 can be acquired. When there are four Web access logs of the user identification information “ab102”, the user identification information is duplicated, so the second and later are deleted. If the time is the same, it is deleted as well. The URL is decomposed into a schema, authority, and path so as to be duplicated, and duplicate data is similarly deleted. When only a part is different, data is stored only at the changed part (with backward matching), and other values are deleted.

図７に、Ｗｅｂアクセスログ重複データ削除処理３７についてのフローチャートを示す。
Ｗｅｂアクセスログ格納機能部は、Ｗｅｂアクセスログ重複データ削除処理３７により、収集したＷｅｂアクセスログを読込む（Ｓ１０１）。つぎに、Ｗｅｂアクセスログ重複データ削除処理３７により、今回読み込んだアクセスログデータと前回読み込んだアクセスログデータとを比較し、前のアクセスログと同じ各データ（利用者識別情報、時刻、ＵＲＬ、利用者ＩＰアドレス、ステータスコード等）はそれぞれ予め定められた一意文字（例、「．」、ブランク等）の重複識別子に変換する（Ｓ１０３）。なお、このとき、ＵＲＬは、ｓｃｈｅｍａ，ａｕｔｈｏｒｉｔｙ，ｐａｔｈの各データについて、後方から差分の文字列を抽出する。つぎに、Ｗｅｂアクセスログ重複データ削除処理３７により、アクセスログを、Ｗｅｂアクセスログ格納・検索装置内の適宜の一時記憶領域又はＷｅｂアクセスログ格納ＨＤＤ４における、利用者識別情報に該当する頭文字のアクセスログファイル名のアクセスログファイルに、格納する（Ｓ１０５）。以上の処理を、次に読み込むべきアクセスログがなくなるまで、繰り返す（Ｓ１０７）。 FIG. 7 shows a flowchart for the duplicate Web access log data deletion process 37.
The Web access log storage function unit reads the collected Web access logs by the Web access log duplicate data deletion process 37 (S101). Next, the Web access log duplication data deletion process 37 compares the access log data read this time with the access log data read last time, and the same data as the previous access log (user identification information, time, URL, usage) (IP address, status code, etc.) are converted into duplicate identifiers of predetermined unique characters (eg, “.”, Blank, etc.) (S103). At this time, for the URL, a difference character string is extracted from the rear for each of the schema, authority, and path data. Next, by the Web access log duplication data deletion processing 37, the access log is accessed with an initial corresponding to the user identification information in the appropriate temporary storage area or the Web access log storage HDD 4 in the Web access log storage / retrieval apparatus. Store in the access log file with the log file name (S105). The above processing is repeated until there is no access log to be read next (S107).

図８に、Ｗｅｂアクセスログメタデータ作成処理４１についてのフローチャートを示す。
Ｗｅｂアクセスログ格納機能部は、Ｗｅｂアクセスログメタデータ作成処理４１により、一時記憶領域又はＷｅｂアクセスログ格納ＨＤＤ４のアクセルログファイルから、Ｗｅｂアクセスログを利用者識別情報（キー）単位で読込む（Ｓ２０１）。つぎに、Ｗｅｂアクセスログメタデータ作成処理４１により、キー単位の圧縮したデータの開始位置及びサイズ等のデータ格納位置情報を求める（Ｓ２０３）。そして、Ｗｅｂアクセスログメタデータ作成処理４１により、利用者識別情報に対応して、アクセスログファイル名と、該当利用者のデータ開始位置及びデータサイズ等のデータ格納位置情報とを含むメダデータを、メタデータファイルに書込む（Ｓ２０５）。以上の処理を、次に読み込むべきアクセスログがなくなるまで、繰り返す（Ｓ２０７）。 FIG. 8 shows a flowchart of the Web access log metadata creation process 41.
The Web access log storage function unit reads the Web access log in units of user identification information (keys) from the temporary storage area or the accelerator log file of the Web access log storage HDD 4 by the Web access log metadata creation processing 41 (S201). ). Next, data storage position information such as the start position and size of the compressed data in key units is obtained by the Web access log metadata creation processing 41 (S203). Then, the web access log metadata creation processing 41 converts the media data including the access log file name and the data storage position information such as the data start position and data size of the corresponding user into the meta corresponding to the user identification information. Writing to the data file (S205). The above processing is repeated until there is no access log to be read next (S207).

図３は、本発明を適用したＷｅｂアクセスログ検索処理の実施の形態のシーケンス図を示したものである。この処理は、Ｗｅｂアクセスログ検索機能部５が実行する。
図３において、Ｗｅｂアクセスログ検索機能部５は、「Ｗｅｂアクセスログ検索受付処理５３（１８）」（カッコ内の参照番号は図１に対応）で、Ｗｅｂアクセスログ検索者５１からの検索の依頼を受け検索処理の受付を行う。検索受付の際は、検索データのキーとなる利用者識別情報（キー）を指定する。例えば、利用者「ａｂ１０２」のＷｅｂアクセスログを検索する場合は、Ｗｅｂアクセスログ検索者５１が「Ｗｅｂアクセスログ検索受付処理５３（１８）」で利用者「ａｂ１０２」のデータを入力する。つぎに、Ｗｅｂアクセスログ検索機能部５は、「検索キーのメタデータ読込処理５４（１９）」で、メタデータファイルからメタデータ５５の読込を行う。なお、メタデータ５５は利用者人数分のデータのみのため、Ｗｅｂアクセスログ５８と比較するとサイズは小さくなる。つぎに、Ｗｅｂアクセスログ検索機能部５は、「該当Ｗｅｂアクセスログの読込箇所特定５６（２０）」で、メタデータ５５から利用者のメタデータを抽出する。例えば、利用者「ａｂ１０２」の読込箇所特定をする場合は、メタデータの利用者識別情報（キー）と一致する該当行を検索して、該当行の対象ファイル名、開始位置、サイズを抽出する。つぎに、Ｗｅｂアクセスログ検索機能部５は、「Ｗｅｂアクセスログの該当箇所読込処理５７（２１）」で、Ｗｅｂアクセスログ（キー単位圧縮）５８から検索指定した利用者の該当データ（Ｗｅｂアクセスログ）５９の読込を行う。この読込処理では、「該当Ｗｅｂアクセスログの読込箇所特定処理５６（２０）」で抽出した対象ファイル名から、検索するＷｅｂアクセスログファイル（キー単位圧縮）５８を指定し、開始位置、サイズから該当データ（Ｗｅｂアクセスログ）５９の読込みを行う。つぎに、Ｗｅｂアクセスログ検索機能部５は、「該当データの解凍処理６０（２２）」で、読込んだ該当データ（Ｗｅｂアクセスログ）５９のデコードと解凍処理を行う。解凍処理後は、重複データが削除された該当利用者のＷｅｂアクセスログ（重複データ削除済）（ＣＳＶファイル）６１が出力される。つぎに、Ｗｅｂアクセスログ検索機能部５は、「重複データ復元処理６２（２３）」で、削除していた重複データの復元を行う。このとき、「重複データ復元処理６２（２３）」により、図７のフローチャートで説明したような重複識別子（一意文字「．」等）に基づき重複データであることを判定し、該当データを復元することができる。例えば、Ｗｅｂアクセスログの時刻が１０時の場合、２行目以降の時刻は削除されているが復元処理で１０時の値を設定する。ＵＲＬの場合、同じｈｏｓｔ、ｐａｔｈが連続することが多く２行目以降の削除、差分のみ記録されているデータの復元処理を行う。つぎに、Ｗｅｂアクセスログ検索機能部５は、「検索結果応答処理６４（２４）」で、Ｗｅｂアクセスログ検索者５１に検索結果６６の応答を行う。なお、Ｗｅｂアクセスログ検索機能部５は、この検索結果をＷｅｂアクセスログ格納ＨＤＤ４に保存してもよい。 FIG. 3 shows a sequence diagram of an embodiment of a Web access log search process to which the present invention is applied. This process is executed by the Web access log search function unit 5.
In FIG. 3, the Web access log search function unit 5 uses the “Web access log search reception process 53 (18)” (reference numbers in parentheses correspond to those in FIG. 1) to request a search from the Web access log searcher 51. And accepts the search process. When accepting a search, user identification information (key) that is a key of search data is designated. For example, when searching the Web access log of the user “ab102”, the Web access log searcher 51 inputs the data of the user “ab102” in the “Web access log search reception process 53 (18)”. Next, the Web access log search function unit 5 reads the metadata 55 from the metadata file in the “search key metadata reading process 54 (19)”. Since the metadata 55 is only data for the number of users, the size is smaller than that of the Web access log 58. Next, the Web access log search function unit 5 extracts the user's metadata from the metadata 55 by “the corresponding Web access log read location specification 56 (20)”. For example, when the reading location of the user “ab102” is specified, the corresponding line that matches the user identification information (key) of the metadata is searched, and the target file name, start position, and size of the corresponding line are extracted. . Next, the Web access log search function unit 5 uses the Web access log (key unit compression) 58 to search for the corresponding data (Web access log) of the user in the “Web access log corresponding portion reading process 57 (21)”. ) 59 is read. In this reading process, the Web access log file (key unit compression) 58 to be searched is designated from the target file name extracted in the “reading process 56 (20) of the corresponding Web access log reading”, and the corresponding from the start position and size. Data (Web access log) 59 is read. Next, the Web access log search function unit 5 performs decoding and decompression processing of the read corresponding data (Web access log) 59 in “corresponding data decompression processing 60 (22)”. After the decompression process, the Web access log (duplicate data deleted) (CSV file) 61 of the corresponding user from which the duplicate data has been deleted is output. Next, the Web access log search function unit 5 restores the deleted duplicate data in the “duplicate data restoration process 62 (23)”. At this time, the “duplicate data restoration process 62 (23)” determines that the data is duplicated based on the duplicate identifier (unique character “.”, Etc.) as described in the flowchart of FIG. 7, and restores the corresponding data. be able to. For example, when the time of the web access log is 10:00, the time after the second line is deleted, but the value of 10:00 is set in the restoration process. In the case of a URL, the same host and path often continue, and deletion processing for the second and subsequent lines and restoration processing of data in which only differences are recorded are performed. Next, the Web access log search function unit 5 sends a search result 66 response to the Web access log searcher 51 in the “search result response process 64 (24)”. The Web access log search function unit 5 may store the search result in the Web access log storage HDD 4.

図５は、図４のＷｅｂアクセスログ検索処理を適用した検索結果画面（１）を示した図である。図５（Ａ）では、通常のＷｅｂアクセス結果画面７９は、重複データを復元したＷｅｂアクセスログ結果を表示する。図５（Ｂ）では、Ｗｅｂアクセスの切り分けがしやすいように、重複データを復元しない形式の画面であるＷｅｂアクセス結果（差分調査用）８０とすることで、変化点を検索者に表示することが可能となる。図５（Ｃ）では、同様にＷｅｂアクセスの切り分けがしやすいように、重複データを字体・太さ・色等で強調したＷｅｂアクセス結果（差分調査用）８１とすることで、変化点を強調して検索者に表示することが可能となる。 FIG. 5 is a diagram showing a search result screen (1) to which the Web access log search process of FIG. 4 is applied. In FIG. 5A, a normal web access result screen 79 displays a web access log result obtained by restoring duplicate data. In FIG. 5B, the change point is displayed to the searcher by using the Web access result (for difference investigation) 80 that is a screen in a format that does not restore duplicate data so that the Web access can be easily separated. Is possible. In FIG. 5C, the change point is emphasized by using the Web access result (for difference investigation) 81 in which the duplicate data is emphasized with the font, thickness, color, etc. so that the Web access can be easily separated in the same manner. Can be displayed to the searcher.

図６は、図４のＷｅｂアクセスログ検索処理を適用した検索結果画面（２）を示した図である。図６（Ａ）のように、検索結果を通常のＷｅｂアクセスログ結果ＣＳＶファイル８２で表示することができる。また、図６（Ｂ）のように、差分調査用のＷｅｂアクセスログ結果（差分調査用）ＣＳＶファイル８３を出力することが可能となる。 FIG. 6 is a view showing a search result screen (2) to which the Web access log search process of FIG. 4 is applied. As shown in FIG. 6A, the search result can be displayed as a normal Web access log result CSV file 82. Further, as shown in FIG. 6B, it is possible to output a web access log result (for difference investigation) CSV file 83 for difference investigation.

以上のように、本実施の形態のような、Ｗｅｂアクセスログ高速化検索技術によれば、位置情報が記録されたメタデータを作成することと、位置情報に基づく該当データをＨＤＤから読み込むことにより、すべてのデータを都度読み込んで出力する方式と比較した場合において、高速に検索結果を取得することが可能となる。また、ＷｅｂアクセスはＷｅｂのリンクをたどっていくアクセスが多くＵＲＬのｈｏｓｔデータ、ｐａｔｈデータ等、同じデータが連続する特性があり差分のみデータを格納することでＨＤＤへのＷｅｂアクセスログデータの効率格納と呼び出しを図る。
As described above, according to the Web access log accelerated search technology as in the present embodiment, by creating metadata in which position information is recorded and reading the corresponding data based on the position information from the HDD. When compared with a method in which all data is read and output each time, search results can be acquired at high speed. In addition, Web access often follows links on the Web, and the same data, such as URL host data and path data, is continuous. By storing only the difference data, Web access log data is efficiently stored in the HDD. And call.

以上では、特に利用者の端末からＷｅｂサーバにアクセスされ、Ｗｅｂサーバにより端末に各種のＷｅｂが提供される例について説明したが、Ｗｅｂに限らず、適宜のコンテンツ又は他の情報に関して、端末からサーバにアクセスして、サーバから端末にその情報を提供するようにしてもよい。
本発明のアクセスログ処理方法又はシステム、アクセスログ格納・検索装置は、その各手順をコンピュータに実行させるためのアクセスログ処理プログラム、アクセスログ処理プログラムを記録したコンピュータ読み取り可能な記録媒体、アクセスログ処理プログラムを含みコンピュータの内部メモリにロード可能なプログラム製品、そのプログラムを含むサーバ等のコンピュータ、等により提供されることができる。
In the above, an example in which a Web server is accessed from a user terminal and various Webs are provided to the terminal by the Web server has been described. However, the terminal is not limited to the Web, and the server from the terminal regarding appropriate content or other information. And the server may provide the information to the terminal.
An access log processing method or system and an access log storage / retrieval apparatus according to the present invention include an access log processing program for causing a computer to execute each procedure, a computer-readable recording medium recording the access log processing program, and an access log process The program product can be provided by a program product that can be loaded into the internal memory of the computer, a computer such as a server that includes the program, and the like.

１利用者の端末
２Ｗｅｂサーバ
１００Ｗｅｂアクセスログ格納・検索装置
６Ｗｅｂアクセスログ検索者の端末
３Ｗｅｂアクセスログ格納機能部
４Ｗｅｂアクセスログ格納ＨＤＤ
５Ｗｅｂアクセスログ検索機能部
DESCRIPTION OF SYMBOLS 1 User's terminal 2 Web server 100 Web access log storage / retrieval apparatus 6 Web access log searcher's terminal 3 Web access log storage function part 4 Web access log storage HDD
5 Web access log search function part

Claims

Multiple servers providing web or content or other information;
A terminal of a user accessing the server;
In an access log processing system comprising an access log storage / retrieval device for processing an access log recording access to the server from a user,
The access log storage / retrieval device includes:
Corresponding to the user identification information, a plurality of accesses divided by one or more acronyms of the user identification information for storing an access log including time and access identification information of the web or content or other information Log files,
Corresponding to the user identification information, a metadata file that stores metadata including an access log file name and access log data storage location information;
With

The access log storage / retrieval device includes:
Collecting access logs about information output to the terminal by each server by access from each user,
Among a plurality of access logs, if duplicate data to each of the user identification information and the time及beauty access identification information is present, remove each of the duplicate data, for each user identification information one or more Create access log data,
Based on one or more access log data for each user identification information, create metadata including access log file name and data storage location information for each user identification information,
Storing one or more access log data for each user identification information in the access log file for each initial of the user identification information, storing metadata for each user identification information in the metadata file,
as well as,
When a search processing request specifying user identification information is received, the metadata file is searched based on the user identification information included in the received search processing request, and the access log file name and data storage location information of the metadata Extract
Read the access log from the access log file of the extracted access log file name according to the extracted data storage location information,
Restore the deleted duplicate data and create an access log, or create an access log with blanks or symbols or characters indicating duplication without restoring the deleted duplicate data. An access log processing system that displays on a display unit or stores in a storage unit.

The access log processing system according to claim 1,
The access log storage / retrieval device includes:
An access log storage function unit for processing the access log and storing the access log and the metadata in the access log file and the metadata file, respectively;
With
The access log storage function unit
Collect access logs about the information that each server outputs to the terminal by access from each user,
The access log collected from each server is divided into a plurality of files for each initial of the user identification information provided for each server .
Merge multiple access logs of multiple files for each initial of multiple servers according to user identification information,
Sort the merged multiple access logs by user identification information unit for each time and access identification information,
The sorted user identification information or time of a plurality of access logs or if there is overlap data access identification information, delete the duplicate data, by inserting a duplicate identifier indicating that delete, Create access log data for each user identification information, store it in the access log file for each initial of the user identification information,
An access log processing system that creates metadata including user identification information, a stored access log file name, and data storage location information based on access log data for each user identification information, and stores the metadata in a metadata file.

In the access log processing system according to claim 1 or 2,
The access log storage / retrieval device includes:
In accordance with the access log search request, the access log search function unit for searching the access log and metadata from the access log file and the metadata file, respectively,
The access log search function unit
A search processing request specifying user identification information is received from the access log searcher terminal.
Based on the user identification information included in the received search processing request, search the metadata file, extract the access log file name and data storage location information in the metadata,
Read the access log from the access log file of the extracted access log file name according to the extracted data storage location information,
In accordance with the duplicate identifier of the access log, restore the deleted duplicate data and create an access log, or create an access log that contains blanks or symbols or characters indicating duplication without restoring the duplicate data deleted, An access log processing system that displays the access log on a display unit or stores it in a storage unit.

In the access log processing system according to any one of claims 1 to 3,
The access log processing system, wherein the data storage position information is a data storage start position and a data storage size, or a data storage start position and a data storage end position.

In the access log processing system according to any one of claims 1 to 4,
The access log processing system, wherein the user identification information includes any one of a user-specific ID, a login ID, a user telephone number, and a user mail address.

In the access log processing system according to any one of claims 1 to 5,
2. The access log processing system according to claim 1, wherein the access identification information is a URL, and duplication is deleted for each scheme data, authority data, host data, port data, and path data.

The access log processing system according to any one of claims 1 to 6,
Performs compression processing and encoding processing of data from which duplicate data has been deleted for each user identification information unit, and stores it in an access log file.
An access log processing system which performs decoding processing and decompression processing of an access log read from an access log file.

Multiple servers providing web or content or other information;
A terminal of a user accessing the server;
In an access log processing system comprising an access log storage / retrieval device for processing an access log recording access to the server from a user,
The access log storage / retrieval device includes:
Corresponding to the user identification information, a plurality of accesses divided by one or more acronyms of the user identification information for storing an access log including time and access identification information of the web or content or other information Log files,
Corresponding to the user identification information, a metadata file that stores metadata including an access log file name and access log data storage location information;
An access log processing method in the access log processing system comprising:

The access log storage / retrieval device includes:
Collecting access logs about information output to the terminal by each server by access from each user,
Among a plurality of access logs, if duplicate data to each of the user identification information and the time及beauty access identification information is present, remove each of the duplicate data, for each user identification information one or more Create access log data,
Based on one or more access log data for each user identification information, create metadata including access log file name and data storage location information for each user identification information,
Storing one or more access log data for each user identification information in the access log file for each initial of the user identification information, storing metadata for each user identification information in the metadata file,
as well as,
When a search processing request specifying user identification information is received, the metadata file is searched based on the user identification information included in the received search processing request, and the access log file name and data storage location information of the metadata Extract
Read the access log from the access log file of the extracted access log file name according to the extracted data storage location information,
Restore the deleted duplicate data and create an access log, or create an access log with blanks or symbols or characters indicating duplication without restoring the deleted duplicate data. An access log processing method for displaying on a display unit or storing in a storage unit.

Multiple servers providing web or content or other information;
A terminal of a user accessing the server;
An access log storage / retrieval device for processing an access log recording access to the server from a user;
Corresponding to the user identification information, a plurality of accesses divided by one or more acronyms of the user identification information for storing an access log including time and access identification information of the web or content or other information Log files,
Corresponding to the user identification information, a metadata file that stores metadata including an access log file name and access log data storage location information;
An access log processing program in an access log processing system comprising:
A procedure of collecting an access log about information output to the terminal by each server by access from each user;
Among a plurality of access logs, if duplicate data to each of the user identification information and the time及beauty access identification information is present, remove each of the duplicate data, for each user identification information one or more Create access log data,
A procedure for creating metadata including an access log file name and data storage location information for each user identification information based on one or a plurality of access log data for each user identification information;
Procedure for storing one or a plurality of access log data for each user identification information in the access log file for each initial of the user identification information, and storing metadata for each user identification information in the metadata file When,
as well as,
When a search processing request specifying user identification information is received, the metadata file is searched based on the user identification information included in the received search processing request, and the access log file name and data storage location information of the metadata Steps to extract
The procedure for reading the access log from the access log file with the extracted access log file name according to the extracted data storage location information,
Restore the deleted duplicate data and create an access log, or create an access log with blanks or symbols or characters indicating duplication without restoring the deleted duplicate data. An access log processing program for causing the access log storage / retrieval apparatus to execute a procedure of displaying on a display unit or storing in a storage unit.

An access log storage and retrieval device for processing an access log recording access from a user to a plurality of servers that provide web or content or other information,

Corresponding to the user identification information, a plurality of accesses divided by one or more acronyms of the user identification information for storing an access log including time and access identification information of the web or content or other information Log files,
Corresponding to the user identification information, a metadata file that stores metadata including an access log file name and access log data storage location information;
An access log storage function unit for processing the access log and storing the access log and the metadata in the access log file and the metadata file, respectively;
In accordance with the access log search request, the access log search function unit for searching the access log and metadata from the access log file and the metadata file, respectively,
The access log storage function unit
Collecting access logs about information output to the terminal by each server by access from each user,
Among a plurality of access logs, if duplicate data to each of the user identification information and the time及beauty access identification information is present, remove each of the duplicate data, for each user identification information one or more Create access log data,
Based on one or more access log data for each user identification information, create metadata including access log file name and data storage location information for each user identification information,
Storing one or more access log data for each user identification information in the access log file for each initial of the user identification information, storing metadata for each user identification information in the metadata file,
as well as,
The access log search function unit
When a search processing request specifying user identification information is received, the metadata file is searched based on the user identification information included in the received search processing request, and the access log file name and data storage location information of the metadata Extract
Read the access log from the access log file of the extracted access log file name according to the extracted data storage location information,
Restore the deleted duplicate data and create an access log, or create an access log with blanks or symbols or characters indicating duplication without restoring the deleted duplicate data. An access log storage / retrieval device that displays on a display unit or stores in a storage unit.