JPWO2013137070A1

JPWO2013137070A1 - Log compression system, log compression method, and program

Info

Publication number: JPWO2013137070A1
Application number: JP2014504812A
Authority: JP
Inventors: 和久古牧
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2012-03-13
Filing date: 2013-03-05
Publication date: 2015-08-03
Also published as: WO2013137070A1

Abstract

ログデータに含まれる各々の構成要素のなかから、最も頻出度の高い要素を検出する頻出度検出部と、ログデータに含まれる各々のログレコードに対して、最も頻出度の高い要素が含まれるか否かを判別する要素出現有無判別部と、ログデータを、最も頻出度の高い要素が含まれるレコード群と、含まれないレコード群に分割する分割部と、分割されたログデータを、汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更する表圧縮前処理部と、汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更されたログデータを圧縮する圧縮処理部とを備える。Among the constituent elements included in the log data, the frequent occurrence detection unit for detecting the element with the highest occurrence frequency, and the element with the highest frequency for each log record included in the log data is included. An element appearance presence / absence discriminating unit for discriminating whether or not the log data is divided into a record group including the element having the highest frequency and a dividing unit for dividing the log data into a record group not included, and the divided log data is A table compression pre-processing unit that changes the data structure to a format that can be easily compressed by a general compression algorithm, and a compression processing unit that compresses log data that has been changed to a data structure that can be easily compressed by a general-purpose compression algorithm.

Description

本発明は、ログ圧縮システム、ログ圧縮方法、及びプログラムに関する。 The present invention relates to a log compression system, a log compression method, and a program.

情報漏えい、不正利用、システムの不具合などの脅威を監視するために、システムの構成要素が生成するイベントログの監視が行われている。イベントログの監視をする目的の一つは、脅威の発覚後にその脅威の原因を究明することである。しかし監視対象のシステムが生成するログは膨大であるため、そのままログを収集すると、ログの保存に必要な記憶容量が大きくなり、また、ログ収集のために多くのリソース（ネットワーク容量・ＣＰＵリソースなど）が必要になる。 Event logs generated by system components are monitored to monitor threats such as information leaks, unauthorized use, and system malfunctions. One of the purposes of monitoring the event log is to investigate the cause of the threat after the threat is detected. However, since the logs generated by the system to be monitored are enormous, collecting the logs as they increases the storage capacity required to save the logs, and many resources (network capacity, CPU resources, etc.) ) Is required.

特許文献１には、時系列観測データの圧縮に関し、直前のデータとの差分を取ることにより、データを圧縮する方法が記載されている。 Patent Document 1 describes a method of compressing data by taking a difference from the immediately preceding data regarding compression of time-series observation data.

特開平３−５５９１９号公報JP-A-3-55919

しかし、システムの構成要素が生成するイベントログは、比較的変化が少ない時系列観測データとは異なり、複数のイベントが出現した順番で非同期に出力されるため、ログを取得した順番のままでは特許文献１のような圧縮方法では十分な圧縮が行えなかった。 However, the event log generated by system components is output asynchronously in the order in which multiple events appear, unlike time-series observation data with relatively little change. The compression method as described in Literature 1 cannot perform sufficient compression.

そこで、本発明の目的は、ログを取得した順番のままでは圧縮しにくいログに対し、より高い圧縮率での圧縮を可能とすることである。 Therefore, an object of the present invention is to enable compression at a higher compression rate for logs that are difficult to compress in the order in which the logs are acquired.

本発明に係るログ圧縮システムは、ログデータに含まれる各々の構成要素のなかから、最も頻出度の高い要素を検出する頻出度検出部と、前記ログデータに含まれる各々のログレコードに対して、前記最も頻出度の高い要素が含まれるか否かを判別する要素出現有無判別部と、前記ログデータを、前記最も頻出度の高い要素が含まれるレコード群と、含まれないレコード群に分割する分割部と、分割された前記ログデータを、汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更する表圧縮前処理部と、前記汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更されたログデータを圧縮する圧縮処理部と、を備えたものである。 The log compression system according to the present invention includes a frequency detection unit that detects an element having the highest frequency from among each component included in log data, and each log record included in the log data. , An element appearance presence / absence determining unit that determines whether or not the element with the highest frequency is included, and the log data is divided into a record group including the element with the highest frequency and a record group not including A partitioning unit, a table compression preprocessing unit that changes the divided log data into a data structure that can be easily compressed by a general-purpose compression algorithm, and a data structure that is easy to compress by the general-purpose compression algorithm A compression processing unit for compressing the log data.

本発明に係るログ圧縮方法は、ログデータに含まれる各々の構成要素のなかから、最も頻出度の高い要素を検出する工程と、前記ログデータに含まれる各々のログレコードに対して、前記最も頻出度の高い要素が含まれるか否かを判別する工程と、前記ログデータを、前記最も頻出度の高い要素が含まれるレコード群と、含まれないレコード群に分割する工程と、分割された前記ログデータを、汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更する工程と、前記汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更されたログデータを圧縮する工程と、を備えたものである。 The log compression method according to the present invention includes a step of detecting an element having the highest frequency from among the constituent elements included in the log data, and the log record method includes: A step of determining whether or not an element with a high frequency is included, a step of dividing the log data into a record group including the element with the highest frequency and a record group not including A step of changing the log data into a data structure in a format that can be easily compressed by a general-purpose compression algorithm; and a step of compressing the log data that has been changed into a data structure in a format that can be easily compressed by the general-purpose compression algorithm. It is a thing.

本発明に係るプログラムは、コンピュータを、ログデータに含まれる各々の構成要素のなかから、最も頻出度の高い要素を検出する頻出度検出部と、前記ログデータに含まれる各々のログレコードに対して、前記最も頻出度の高い要素が含まれるか否かを判別する要素出現有無判別部と、前記ログデータを、前記最も頻出度の高い要素が含まれるレコード群と、含まれないレコード群に分割する分割部と、分割された前記ログデータを、汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更する表圧縮前処理部と、前記汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更されたログデータを圧縮する圧縮処理部と、して機能させるためのものである。 The program according to the present invention includes a computer that detects a most frequent element from among each component included in log data, and each log record included in the log data. An element appearance presence / absence determining unit for determining whether or not the element with the highest frequency is included, and the log data into a record group including the element with the highest frequency and a record group not including A division unit for dividing, a table compression preprocessing unit for changing the divided log data into a data structure in a format that can be easily compressed by a general-purpose compression algorithm, and a data structure in a format that can be easily compressed by the general-purpose compression algorithm This is to function as a compression processing unit that compresses the changed log data.

本発明によれば、ログを取得した順番のままでは圧縮しにくいログに対し、より高い圧縮率での圧縮が可能となる。 According to the present invention, it is possible to compress at a higher compression rate with respect to a log that is difficult to compress in the order in which the logs are acquired.

本発明の実施の形態１による、ログ圧縮システムの構成を示すブロック図である。It is a block diagram which shows the structure of the log compression system by Embodiment 1 of this invention. 本発明の実施の形態１による、格納形式変更部の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the storage format change part by Embodiment 1 of this invention. 本発明の実施の形態１による、ログ圧縮システムの動作のフローチャートである。It is a flowchart of operation | movement of the log compression system by Embodiment 1 of this invention. 本発明の実施の形態１による、ログ圧縮システムの動作を説明する図である。It is a figure explaining operation | movement of the log compression system by Embodiment 1 of this invention. 本発明の実施の形態１による、ログ圧縮システムの動作を説明する図である。It is a figure explaining operation | movement of the log compression system by Embodiment 1 of this invention. 本発明の実施の形態１による、ログ圧縮システムの動作を説明する図である。It is a figure explaining operation | movement of the log compression system by Embodiment 1 of this invention. 本発明の実施の形態１による、ログ一時保存装置に格納されるログの例を示す図である。It is a figure which shows the example of the log stored in the log temporary storage apparatus by Embodiment 1 of this invention. 本発明の実施の形態２による、ログ圧縮システムの構成を示すブロック図である。It is a block diagram which shows the structure of the log compression system by Embodiment 2 of this invention. 本発明の実施の形態２による、制限付き格納形式変更部の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the limited storage format change part by Embodiment 2 of this invention. 本発明の実施の形態２による、ログ圧縮システムの動作のフローチャートである。It is a flowchart of operation | movement of the log compression system by Embodiment 2 of this invention. 本発明の実施の形態２による、ログ圧縮システムの動作を説明する図である。It is a figure explaining operation | movement of the log compression system by Embodiment 2 of this invention.

実施の形態１．
次に、本発明を実施するための形態について、図面を参照して詳細に説明する。
図１は、本発明の実施の形態１によるログ圧縮システム１０の構成を示すブロック図である。図に示すように、ログ圧縮システム１０は、ログ圧縮装置１、ログ一時保存装置２、およびログ保存装置３を備えている。ログ圧縮装置１は、ログ一時保存装置２およびログ保存装置３と通信回線を介して接続されている。Embodiment 1 FIG.
Next, embodiments for carrying out the present invention will be described in detail with reference to the drawings.
FIG. 1 is a block diagram showing a configuration of a log compression system 10 according to Embodiment 1 of the present invention. As shown in the figure, the log compression system 10 includes a log compression device 1, a log temporary storage device 2, and a log storage device 3. The log compression device 1 is connected to the temporary log storage device 2 and the log storage device 3 via a communication line.

ログ圧縮装置１は、ＣＰＵ、ＲＯＭやＲＡＭ等のメモリ、各種の情報を格納する外部記憶装置、入力インタフェース、出力インタフェース、通信インタフェース及びこれらを結ぶバスを備える専用又は汎用のコンピュータを適用することができる。なお、ログ圧縮装置１は、単一のコンピュータにより構成されるものであっても、通信回線を介して互いに接続された複数のコンピュータにより構成されるものであってもよい。 The log compression device 1 may be a dedicated or general-purpose computer that includes a CPU, a memory such as a ROM or a RAM, an external storage device that stores various information, an input interface, an output interface, a communication interface, and a bus that connects these. it can. Note that the log compression device 1 may be configured by a single computer or may be configured by a plurality of computers connected to each other via a communication line.

図１に示すように、ログ圧縮装置１は、格納形式変更部１１、表圧縮前処理部１２、および圧縮処理部１３を含む。格納形式変更部１１、表圧縮前処理部１２、および圧縮処理部１３は、ＣＰＵがＲＯＭ等に格納された所定のプログラムを実行することにより実現される機能のモジュールに相当する。 As shown in FIG. 1, the log compression apparatus 1 includes a storage format changing unit 11, a table compression preprocessing unit 12, and a compression processing unit 13. The storage format change unit 11, the table compression preprocessing unit 12, and the compression processing unit 13 correspond to modules of functions realized by the CPU executing a predetermined program stored in a ROM or the like.

図２は、格納形式変更部１１の機能構成を示すブロック図である。
図２に示すように、格納形式変更部１１は、頻出度検出部１１１、要素出現有無判別部１１２、および分割部１１３を備えている。FIG. 2 is a block diagram illustrating a functional configuration of the storage format changing unit 11.
As illustrated in FIG. 2, the storage format changing unit 11 includes a frequency detection unit 111, an element appearance presence / absence determining unit 112, and a dividing unit 113.

格納形式変更部１１は、ログ一時保存装置２からログのテーブルを受信すると、頻出度検出部１１１において、当該ログのテーブルに含まれる各要素の中で頻出度が最も高い要素を検出する。ここで要素とは、テーブルに含まれる各々のデータ項目の値のことである。 When the storage format change unit 11 receives the log table from the temporary log storage device 2, the frequency detection unit 111 detects the element with the highest frequency among the elements included in the log table. Here, the element is a value of each data item included in the table.

要素出現有無判別部１１２は、検出された頻出度が最も高い要素について、その要素の種類（データ項目）、要素の値（データ項目の値）、および各ログレコードにその要素が含まれるか否かを０（有），１（無）で表した情報を含む分割情報を生成する。 The element appearance presence / absence discriminating unit 112, for the element with the highest frequency of detection, the element type (data item), the element value (data item value), and whether or not the element is included in each log record The division information including the information that represents “0” (Yes) and 1 (No) is generated.

分割部１１３は、ログのテーブルを頻出度の最も高い要素が含まれるログのテーブルと、含まれないログのテーブルに分割する。 The dividing unit 113 divides the log table into a log table including the element with the highest frequency and a log table not including the log table.

圧縮処理部１３は、表圧縮前処理部１２で圧縮しやすい形式に変更したログを圧縮する。圧縮は、ＧｚｉｐやＢｚｉｐ２のような一般的な圧縮手法を用いて行うことができる。 The compression processing unit 13 compresses the log changed to a format that can be easily compressed by the table compression pre-processing unit 12. The compression can be performed using a general compression method such as Gzip or Bzip2.

ログ一時保存装置２は、外部のログ収集システムから得られた複数のデータ項目から構成されるログのテーブルが格納されている。ログ一時保存装置２は、これらのログを圧縮しやすい形式に変更して圧縮するまで、一時的に保存する装置である。図６は、ログ一時保存装置２に格納されるログの例を示す図である。図６に示すように、ログ一時保存装置２には、複数のデータ項目から構成されるログが格納されている。 The log temporary storage device 2 stores a log table composed of a plurality of data items obtained from an external log collection system. The log temporary storage device 2 is a device that temporarily stores these logs until they are compressed into a format that is easy to compress and compressed. FIG. 6 is a diagram illustrating an example of a log stored in the log temporary storage device 2. As shown in FIG. 6, the log temporary storage device 2 stores a log composed of a plurality of data items.

ログ保存装置３は、ログ圧縮装置１で圧縮されたログを保存する装置である。ログ一時保存装置２とログ保存装置３は記憶装置により実装される。 The log storage device 3 is a device that stores the log compressed by the log compression device 1. The log temporary storage device 2 and the log storage device 3 are implemented by a storage device.

次に、図３のフローチャートを用いて、ログ圧縮システム１０の動作について説明する。
まず、格納形式変更部１１の頻出度検出部１１１において、ログ一時保存装置２から取得したログに含まれる全ての要素の中から、最も頻出度の高い要素を検出する（ステップＳ１）。例えば、図４に示す（１）のようなログのテーブルが得られた場合、頻出度検出部１１１で検出される頻出度が最も高い要素は項目「Ｕｓｅｒ」の値「１」である。Next, the operation of the log compression system 10 will be described using the flowchart of FIG.
First, the frequent occurrence detection unit 111 of the storage format changing unit 11 detects the element with the highest frequent occurrence from all the elements included in the log acquired from the temporary log storage device 2 (step S1). For example, when a log table such as (1) shown in FIG. 4 is obtained, the element with the highest frequency detected by the frequency detector 111 is the value “1” of the item “User”.

次に、要素出現有無判別部１１２において、最も頻出度の高い要素が、ログの各行に含まれるか否かを判別し、０（有），１（無）で表す(ステップＳ２）。
また、その要素の種類（データ項目）、要素の値（データ項目の値）、および各行にその要素が含まれるか否かを０（有），１（無）で表した情報を含む分割情報を生成する。Next, the element appearance presence / absence discriminating unit 112 discriminates whether or not the element having the highest frequency is included in each line of the log, and represents it with 0 (present) and 1 (none) (step S2).
Further, the division information including the element type (data item), the element value (data item value), and information indicating whether or not the element is included in each row by 0 (Yes) or 1 (No) Is generated.

図４に示す例では、要素出現有無判別部１１２で生成される分割情報は、図４の（２）に示すように、（Ｔｙｐｅ（要素の種類），Ｃｏｎｔｅｎｔｓ（要素の値），Ｓｔｒｅａｍ（要素の出現有無））＝（ｕｓｅｒ，１，００１０１）となる。 In the example shown in FIG. 4, the division information generated by the element appearance presence / absence discriminating unit 112 includes (Type (element type), Content (element value), Stream (element), as shown in (2) of FIG. Or not)) = (user, 1,00101).

次に、分割部１１３において、ログのテーブルを頻出度の最も高い要素が含まれるログのテーブルと、含まれないログのテーブルに分割する（ステップＳ３）。 Next, the dividing unit 113 divides the log table into a log table including the element with the highest frequency and a log table not including the element (step S3).

図４の例では、（１）のテーブルを最も頻出度が高い要素が含まれるログのテーブル(３)と、最も頻出度が高い要素が含まれないログのテーブル(４)に分割する。 In the example of FIG. 4, the table (1) is divided into a log table (3) that includes the element with the highest frequency and a log table (4) that does not include the element with the highest frequency.

なお、テーブルの分割は１度でもよいしに、さらに、頻出度の最も高い要素が含まれないログのテーブルに関してのみ、２番目、３番目に頻出度が高い要素の有無で分割してもよい（ステップＳ１〜Ｓ４の繰り返し）。繰り返しは、ユーザの操作等によって適当なところで中断してもよいし、それ以上テーブルを分割できなくなるまで分割してもよい。 The table may be divided once, or only for the log table that does not include the element with the highest frequency, and may be divided according to the presence or absence of the element with the second and third highest frequency. (Repeat steps S1 to S4). The repetition may be interrupted at an appropriate place by a user operation or the like, or may be divided until the table cannot be divided any more.

図５Ａ，Ｂは、テーブルをそれ以上分割できなくなるまで分割した例を示している。このとき、図５Ｂに示す分割表によってログ全体を表現することができる。 5A and 5B show an example in which the table is divided until it can no longer be divided. At this time, the entire log can be expressed by the contingency table shown in FIG. 5B.

次に、表圧縮前処理部１２において、分割されたログの圧縮前処理を行う（ステップＳ４）。
すなわち、格納形式変更部１１によって圧縮しやすい格納形式に変更されたログを、さらに圧縮効率の高い形式に変更する。具体的には、分割情報のうち、最も頻出度が高い要素が含まれるか含まれないかという情報（図３の（２）の例では「Ｓｔｒｅａｍ」）に対して連長処理を施す。Next, the table compression pre-processing unit 12 performs pre-compression processing for the divided logs (step S4).
That is, the log that has been changed to a storage format that can be easily compressed by the storage format changing unit 11 is changed to a format with higher compression efficiency. Specifically, continuous length processing is performed on information indicating whether or not an element having the highest frequency is included in the division information (“Stream” in the example of (2) in FIG. 3).

例えば、「Ｓｔｒｅａｍ」＝（００００００１１１１１００００）は、連長圧縮により（０：６，１：５，０：４）と表すことができる。 For example, “Stream” = (000000111110000) can be expressed as (0: 6, 1: 5, 0: 4) by continuous length compression.

次に、圧縮処理部１３において、表圧縮前処理部１２でさらに圧縮しやすい形式に変更したログを圧縮し、ログ保存装置３に格納する（ステップＳ５）。
圧縮は、ＧｚｉｐやＢｚｉｐ２のような一般的な圧縮手法を用いて行うことができる。Next, the compression processing unit 13 compresses the log that has been changed to a format that can be further compressed by the table compression pre-processing unit 12, and stores it in the log storage device 3 (step S5).
The compression can be performed using a general compression method such as Gzip or Bzip2.

以上のように、本実施形態によれば、頻出度検出部１１１において、ログの中での頻出度が最大の要素を検出し、要素出現有無判別部１１２においてその要素がログの各行に含まれるか含まれないかを判断し、圧縮処理部１３においてその要素が含まれるログと含まれないログとに分割するようにした。これにより、非同期に発生するイベントが、ログの要素の頻出度という観点から分類できる。一般に、頻出度の高い要素を含むログは、アクセスの特徴を含んでいるログと考えられる。これは、アクセスログには、空間的局所性（同一または類似のアクセスが反復して発生する。）や、時間的局所性（短期間に大量のアクセスが発生する。）という特徴があるからである。また、最大頻出する要素を含むか含まないかでテーブルを分割することにより、頻出する要素が連続して現れることが期待され、連長圧縮の効果が大きい形式となる。 As described above, according to the present embodiment, the frequency detection unit 111 detects the element with the highest frequency in the log, and the element appearance presence / absence determination unit 112 includes the element in each line of the log. The compression processing unit 13 divides the log into the log including the element and the log not including the element. As a result, events that occur asynchronously can be classified from the viewpoint of the frequency of log elements. In general, a log including an element with a high frequency of occurrence is considered as a log including an access characteristic. This is because the access log is characterized by spatial locality (the same or similar access occurs repeatedly) and temporal locality (a large amount of access occurs in a short period of time). is there. In addition, by dividing the table according to whether or not the elements that occur frequently are included, it is expected that the elements that appear frequently appear continuously, and the effect of continuous length compression is large.

実施の形態２．
図７は、本発明の実施の形態２によるログ圧縮システム４０の構成を示すブロック図である。実施の形態２のログ圧縮装置４は、格納形式変更部１１の代わりに制限付き格納形式変更部４１を備えている。その他の構成は実施の形態１と同様である。Embodiment 2. FIG.
FIG. 7 is a block diagram showing the configuration of the log compression system 40 according to the second embodiment of the present invention. The log compression device 4 according to the second embodiment includes a restricted storage format change unit 41 instead of the storage format change unit 11. Other configurations are the same as those of the first embodiment.

図８は、制限付き格納形式変更部４１の機能構成を示すブロック図である。
図に示すように、制限付き格納形式変更部４１は、頻出度検出部１１１、要素出現有無判別部１１２、および制限付き分割部４１３を備えている。FIG. 8 is a block diagram illustrating a functional configuration of the restricted storage format changing unit 41.
As shown in the figure, the restricted storage format changing unit 41 includes a frequency detection unit 111, an element appearance presence / absence discriminating unit 112, and a restricted dividing unit 413.

次に、図９のフローチャートを用いて、ログ圧縮システム４０の動作について説明する。
ステップＳ１、ステップＳ２は、図３に示す実施の形態１の処理と同様に動作する。Next, the operation of the log compression system 40 will be described using the flowchart of FIG.
Steps S1 and S2 operate in the same manner as the processing of the first embodiment shown in FIG.

次に、ステップＳ６では、テーブルの分割の効果を判断し、分割を行うことにより圧縮率が向上すると判断された場合にのみテーブルの分割を行う。 Next, in step S6, the effect of the table division is determined, and the table is divided only when it is determined that the compression ratio is improved by performing the division.

圧縮率向上の効果の判断基準は、例えば、分割後の要素の種類と値の種類の関係とすることができる。例えば、図１０に示す例では、分割後の要素の種類と値の種類の関係は、要素の種類が１つに対し値の種類が７つとなっている。このような場合、この状態から分割を実行しても、要素の値がそれぞれ異なる７つのテーブルに分割されるだけで連長圧縮の効果は向上しないため、これ以上分割を行わずにステップＳ４に移行する。
ステップＳ４、ステップＳ５は、図３に示す実施の形態１の処理と同様に動作する。The criteria for determining the effect of improving the compression ratio can be, for example, the relationship between the type of element after division and the type of value. For example, in the example shown in FIG. 10, the relationship between the type of element after division and the type of value is seven for one type of element and seven types of value. In such a case, even if the division is performed from this state, the effect of the continuous length compression is not improved only by dividing the table into seven tables each having different element values. Transition.
Steps S4 and S5 operate in the same manner as the processing of the first embodiment shown in FIG.

以上のように、本実施形態によれば、アクセスログ特有の特徴ないログのように、頻出度の高い要素の有無によってテーブルを分割しても圧縮効果が向上しない場合には、分割を行わずに圧縮処理をするようにしたので、不要なテーブルの分割処理を行わないようにすることができ、処理効率を向上させることができる。また、ログの構成によってはテーブルの分割により逆に圧縮効果が下がる場合もあるが、実施の形態２によればそのような事態を避けることができる。 As described above, according to the present embodiment, if the compression effect is not improved even if the table is divided depending on the presence or absence of elements having a high frequency, such as a log that is not unique to the access log, the division is not performed. Therefore, unnecessary table division processing can be prevented from being performed, and processing efficiency can be improved. Also, depending on the log configuration, the compression effect may be reduced by dividing the table. However, according to the second embodiment, such a situation can be avoided.

本発明は、アクセスログを解析サーバへ送る際のネットワークの負荷軽減のために適用できる。また、アクセスログを一定期間保存しておく際のディスク容量の削減にも適用可能である。 The present invention can be applied to reduce the load on the network when sending the access log to the analysis server. It can also be applied to disk capacity reduction when storing access logs for a certain period.

この出願は、２０１２年３月１３日に出願された日本出願特願２０１２−５５８００を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2012-55800 for which it applied on March 13, 2012, and takes in those the indications of all here.

以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 While the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

上記の実施の形態の一部または全部は、以下の付記のようにも記載されうるが、以下には限られない。
（付記１）ログデータに含まれる各々の構成要素のなかから、最も頻出度の高い要素を検出する頻出度検出部と、
前記ログデータに含まれる各々のログレコードに対して、前記最も頻出度の高い要素が含まれるか否かを判別する要素出現有無判別部と、
前記ログデータを、前記前記最も頻出度の高い要素が含まれるレコード群と、含まれないレコード群に分割する分割部と、
分割された前記ログデータを、汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更する表圧縮前処理部と、
前記汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更されたログデータを圧縮する圧縮処理部と、を備えたログ圧縮システム。A part or all of the above embodiment can be described as in the following supplementary notes, but is not limited thereto.
(Supplementary Note 1) A frequency detection unit that detects an element with the highest frequency from each of the components included in the log data;
For each log record included in the log data, an element appearance presence / absence determining unit that determines whether or not the element having the highest frequency is included,
A division unit that divides the log data into a record group that includes the most frequent element and a record group that is not included;
A table compression pre-processing unit that changes the divided log data into a data structure in a format that can be easily compressed by a general-purpose compression algorithm;
A log compression system comprising: a compression processing unit that compresses log data that has been changed to a data structure in a format that can be easily compressed by the general-purpose compression algorithm.

（付記２）前記分割部は、
前記ログデータを分割することにより、圧縮効果が向上すると判断した場合にのみ、前記ログデータを分割する、付記１に記載のログ圧縮システム。(Supplementary note 2)
The log compression system according to appendix 1, wherein the log data is divided only when it is determined that the compression effect is improved by dividing the log data.

（付記３）前記前記最も頻出度の高い要素が含まれないレコード群を、それらのレコード群に含まれる最も頻出度の高い要素の有無によってさらに分割し、分割後のログデータを圧縮する、付記１または２に記載のログ圧縮システム。 (Supplementary note 3) The record group that does not include the element with the highest frequency is further divided according to the presence or absence of the element with the highest frequency in the record group, and the log data after division is compressed. The log compression system according to 1 or 2.

（付記４）前記表圧縮前処理部は、
前記分割された前記ログデータに連長処理を施す、付記１から３のいずれか１項に記載のログ圧縮システム。(Supplementary Note 4) The table compression preprocessing unit is:
The log compression system according to any one of appendices 1 to 3, wherein a continuous length process is performed on the divided log data.

（付記５）ログデータに含まれる各々の構成要素のなかから、最も頻出度の高い要素を検出する工程と、
前記ログデータに含まれる各々のログレコードに対して、前記最も頻出度の高い要素が含まれるか否かを判別する工程と、
前記ログデータを、前記前記最も頻出度の高い要素が含まれるレコード群と、含まれないレコード群に分割する工程と、
分割された前記ログデータを、汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更する工程と、
前記汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更されたログデータを圧縮する工程と、を備えたログ圧縮方法。(Additional remark 5) The process of detecting the element with the highest frequency from each component included in log data,
Determining whether or not the most frequent element is included for each log record included in the log data;
Dividing the log data into a record group including the element with the highest frequency and a record group not included;
Changing the divided log data into a data structure in a format that can be easily compressed by a general-purpose compression algorithm;
A log compression method comprising: compressing log data that has been changed to a data structure in a format that can be easily compressed by the general-purpose compression algorithm.

（付記６）コンピュータを、
ログデータに含まれる各々の構成要素のなかから、最も頻出度の高い要素を検出する頻出度検出部と、
前記ログデータに含まれる各々のログレコードに対して、前記最も頻出度の高い要素が含まれるか否かを判別する要素出現有無判別部と、
前記ログデータを、前記前記最も頻出度の高い要素が含まれるレコード群と、含まれないレコード群に分割する分割部と、
分割された前記ログデータを、汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更する表圧縮前処理部と、
前記汎用的圧縮アルゴリズムが圧縮しやすい形式のデータ構造に変更されたログデータを圧縮する圧縮処理部と、して機能させるためのプログラム。(Appendix 6)
Of the components included in the log data, a frequency detection unit that detects the element with the highest frequency,
For each log record included in the log data, an element appearance presence / absence determining unit that determines whether or not the element having the highest frequency is included,
A division unit that divides the log data into a record group that includes the most frequent element and a record group that is not included;
A table compression pre-processing unit that changes the divided log data into a data structure in a format that can be easily compressed by a general-purpose compression algorithm;
A program for causing the general-purpose compression algorithm to function as a compression processing unit that compresses log data that has been changed to a data structure in a format that can be easily compressed.

１，４ログ圧縮装置、２ログ一時保存装置、３ログ保存装置、１０，４０ログ圧縮システム、１１格納形式変更部、１２表圧縮前処理部、１３圧縮処理部、４１制限付き格納形式変更部、１１１頻出度検出部、１１２要素出現有無判別部、１１３分割部、４１３制限付き分割部 1, 4 Log compression device, 2 Log temporary storage device, 3 Log storage device, 10, 40 Log compression system, 11 Storage format change unit, 12 Table compression preprocessing unit, 13 Compression processing unit, 41 Limited storage format change unit , 111 Frequency detection unit, 112 Element appearance presence / absence determination unit, 113 division unit, 413 division unit with restriction

Claims

Of the components included in the log data, a frequency detection unit that detects the element with the highest frequency,
For each log record included in the log data, an element appearance presence / absence determining unit that determines whether or not the element having the highest frequency is included,
A division unit that divides the log data into a record group that includes the element with the highest frequency and a record group that is not included;
A table compression pre-processing unit that changes the divided log data into a data structure in a format that can be easily compressed by a general-purpose compression algorithm;
A log compression system comprising: a compression processing unit that compresses log data that has been changed to a data structure in a format that can be easily compressed by the general-purpose compression algorithm.

The dividing unit is
The log compression system according to claim 1, wherein the log data is divided only when it is determined that the compression effect is improved by dividing the log data.

The record group that does not include the element with the highest frequency is further divided according to the presence or absence of the element with the highest frequency included in the record group, and the log data after division is compressed. Log compression system described in.

The table compression preprocessing unit
The log compression system according to claim 1, wherein a continuous length process is performed on the divided log data.

A step of detecting an element with the highest frequency from each component included in log data;
Determining whether or not the most frequent element is included for each log record included in the log data;
Dividing the log data into a record group including the element with the highest frequency and a record group not included;
Changing the divided log data into a data structure in a format that can be easily compressed by a general-purpose compression algorithm;
A log compression method comprising: compressing log data that has been changed to a data structure in a format that can be easily compressed by the general-purpose compression algorithm.

Computer
Of the components included in the log data, a frequency detection unit that detects the element with the highest frequency,
For each log record included in the log data, an element appearance presence / absence determining unit that determines whether or not the element having the highest frequency is included,
A division unit that divides the log data into a record group that includes the element with the highest frequency and a record group that is not included;
A table compression pre-processing unit that changes the divided log data into a data structure in a format that can be easily compressed by a general-purpose compression algorithm;
A program for causing the general-purpose compression algorithm to function as a compression processing unit that compresses log data that has been changed to a data structure in a format that can be easily compressed.