CN101267338B

CN101267338B - High-performance log and behavior auditing system

Info

Publication number: CN101267338B
Application number: CN2008100604862A
Authority: CN
Inventors: 黄艺海
Original assignee: HANGZHOU SAFETYBASE INFORMATION TECHNOLOGY Co Ltd
Current assignee: HANGZHOU SAFETYBASE INFORMATION TECHNOLOGY Co Ltd
Priority date: 2008-04-23
Filing date: 2008-04-23
Publication date: 2010-10-13
Anticipated expiration: 2028-04-23
Also published as: CN101267338A

Abstract

The invention discloses a high performance log and a behaviour audit system, main body of the system uses the three-layer structure composed by a data collecting layer, a data analyzing layer and a data storing and searching layer; the three layers uses TCP encryption transmission, each of them can be independently expanded, which enhances the performance of one layer. The invention is completely based on an embedded system, and performs pertinent optimization to software and hardware in terms of collection; the invention uses special software and hardware interface and arithmetic in storing and searching the log and behavioural data, keeps independence in designing the modules for the purpose of expansion, therefore the bottleneck will not happen to the single module.

Description

High-performance log and behavior auditing system

Technical field

The invention belongs to daily record and behavior audit field, specifically relate to a kind of high-performance log and behavior auditing system.

Background technology

Present daily record and behavior auditing system are in image data, when analyzing data and storage data, mainly be to utilize linux/unix or Windows server, adopt general relational database to come the constructing system platform, on this platform, further develop the application to data collection and analyzing stored.Yet for the environment of the such big data quantity of daily record and behavior, use such platform often gather, analyze and retrieval on can't satisfy the requirement of performance.Reason is: 1, for server, in order to satisfy needs professional and stability, moved a lot of services and finger daemon, but these services and finger daemon great majority there is no need for audit is used; 2, for general relational database, except data storage and query is provided, also need to provide the function of a lot of data correlations, therefore can't satisfy in the audit the single but huge quick-searching of data volume simultaneously of data relationship; 3, the application that often develops is gathered, is analyzed and storage can not be fully independent, and the generation bottleneck just is difficult to use on certain module in case cause having.

Summary of the invention

The present invention solves above-mentioned existing in prior technology technical problem, and a kind of high-performance log and behavior auditing system are provided.

Above-mentioned technical problem of the present invention is mainly solved by following technical proposals: system body of the present invention adopts data collection layer, data analysis layer and three layers of design of search data memory layer, adopt the TCP encrypted transmission between each layer, each layer all can be expanded separately, thereby improves the performance of individual layer.

Wherein data collection layer adopts PCI-E network interface industrial control board, configuration INTEL double-core CPU, DDRII internal memory and DOM dish.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.Capture program is operated in this layer and adopts zero duplication technology, at the initial stage of data acquisition, finishes the filtration of most of hash, reduces the pressure of analyzing and copying.The data format that collects turns to the unified structure body, by the TCP encrypted transmission to the data analysis layer.

The data analysis layer adopts server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.Data analysis module is operated in the data that also receive in this layer from acquisition layer, adopts tree type rule base to analyze every data, takes corresponding action, then data communication device is crossed the TCP encrypted transmission to the search data memory layer.

The search data memory layer adopts server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.The search data memory layer is operated in this system, receives the data from the data analysis layer, and the data that receive at first are stored among the ramdisk, after a certain amount of, writes hard disk in batches.Writing fashionablely, at first transferring structure to binary stream, then with sequence number unique in the structure and binary stream one by one correspondence write raw data file.Write after the hard disk, in order to retrieve efficiently, for each field of unified structure body is set up index file.Structure is divided into numeric type and two kinds of data of character string type, often for these two kinds of data, not only computer processing mode difference, and the mode that need retrieve is also different (for example, numeric type be generally greater than, less than with equal, and character string type is for comprising or not comprising).Therefore, according to the type difference, the index file of setting up is also different when setting up index file.The numeric type index file is set up by binary mode, comprises two kinds of information in the file: the one, and the numerical value of corresponding field in the structure, the 2nd, corresponding structure index number (for every data, sequence number is unique).When retrieval, by the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data.The community string index community file is calculating character string hash value at first, comprise three kinds of information in the file: the one, the original character string of corresponding field in the structure, the 2nd, the hash value of character string correspondence, the 3rd, corresponding structure index number (for every data, sequence number is unique).When retrieval, by the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data.

The design of high-performance log of the present invention and behavior auditing system is fully based on embedded system, and on gathering, software and hardware is carried out specific aim optimization, at daily record and behaviour pattern data storage and special-purpose interface between software and hardware and the algorithm of retrieval employing, design in module keeps independence, be convenient to expansion, can not cause bottleneck in single module.

Description of drawings

Fig. 1 is a kind of structural representation of the present invention.

Embodiment

Below by embodiment, and in conjunction with the accompanying drawings, technical scheme of the present invention is described in further detail.

Embodiment: referring to Fig. 1, system body of the present invention adopts data collection layer, data analysis layer and three layers of design of search data memory layer, adopts the TCP encrypted transmission between each layer, and each layer all can be expanded separately, thereby improves the performance of individual layer.

In this a series of process, performance consumption is maximum also to be the place the highest to performance requirement simultaneously, just is storage and retrieves this step.In storage and retrieval, hard disk can produce frequent read-write, and in order to improve this performance, system has used software to cooperate the design of hardware.At first, guarantee to write in batches disk on the software as far as possible; Then, when writing in batches, cut apart according to DMA write-in block size and to write data; At last, when reading writing harddisk, reduce disk tracking number of times with rational algorithm as far as possible.Therefore owing to when these Design of Read-Writes, with the close relation of hardware and operating system nucleus, can only adopt the method that customizes kernel and hardware to ensure efficient.

At last, should be pointed out that above embodiment only is the more representational example of the present invention.Obviously, technical scheme of the present invention is not limited to the foregoing description, and many distortion can also be arranged.All distortion that those of ordinary skill in the art can directly derive or associate from content disclosed by the invention all should be thought protection scope of the present invention.

Claims

1. high-performance log and behavior auditing system, it is characterized in that described system body adopts data collection layer, data analysis layer and three layers of design of search data memory layer, adopt the TCP encrypted transmission between each layer, each layer all can be expanded separately, thereby improves the performance of individual layer; Wherein

Data collection layer: adopt PCI-E network interface industrial control board, configuration INTEL double-core CPU, DDRII internal memory and DOM dish, cutting linux kernel, only stay necessary driving and module, make the startup mirror image, and system start-up is operated in later in the internal memory fully, to accelerate system running speed; Capture program is operated in this layer and adopts zero duplication technology, at the initial stage of data acquisition, finish the filtration of most of hash, reduce the pressure of analyzing and copying, the data format that collects turns to the unified structure body, by the TCP encrypted transmission to the data analysis layer;

Data analysis layer: adopt server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5, cutting linux kernel, only stay necessary driving and module, make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed; Data analysis module is operated in the data that also receive in this layer from acquisition layer, adopts tree type rule base to analyze every data, takes corresponding action, then data communication device is crossed the TCP encrypted transmission to the search data memory layer;

Search data memory layer: adopt server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5, cutting linux kernel, only stay necessary driving and module, make the startup mirror image, and system start-up is operated in later in the internal memory fully, to accelerate system running speed; The search data memory layer is operated in this system and receives the data from the data analysis layer, and the data that receive at first are stored among the ramdisk, after a certain amount of, writes hard disk in batches; Writing fashionablely, at first transferring structure to binary stream, then with sequence number unique in the structure and binary stream one by one correspondence write raw data file;

In the described search data memory layer, to receive from the data of data analysis layer and write in batches after the hard disk, in order to retrieve efficiently, for each field of unified structure body is set up index file; Structure is divided into numeric type and two kinds of data of character string type, the numeric type index file is set up by binary mode, numerical value and corresponding two kinds of information of structure index number of comprising corresponding field in the structure in the file, when retrieval, pass through the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data; The community string index community file is calculating character string hash value at first, corresponding hash value and the corresponding three kinds of information of structure index number of original character string, character string that comprise corresponding field in the structure in the file, when retrieval, pass through the retrieval to the manipulative indexing file, qualified sequence number set can be retrieved, data can be in raw data file, obtained by this set.