CN101267338B - High-performance log and behavior auditing system - Google Patents

High-performance log and behavior auditing system Download PDF

Info

Publication number
CN101267338B
CN101267338B CN2008100604862A CN200810060486A CN101267338B CN 101267338 B CN101267338 B CN 101267338B CN 2008100604862 A CN2008100604862 A CN 2008100604862A CN 200810060486 A CN200810060486 A CN 200810060486A CN 101267338 B CN101267338 B CN 101267338B
Authority
CN
China
Prior art keywords
data
layer
file
retrieval
operated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2008100604862A
Other languages
Chinese (zh)
Other versions
CN101267338A (en
Inventor
黄艺海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HANGZHOU SAFETYBASE INFORMATION TECHNOLOGY Co Ltd
Original Assignee
HANGZHOU SAFETYBASE INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HANGZHOU SAFETYBASE INFORMATION TECHNOLOGY Co Ltd filed Critical HANGZHOU SAFETYBASE INFORMATION TECHNOLOGY Co Ltd
Priority to CN2008100604862A priority Critical patent/CN101267338B/en
Publication of CN101267338A publication Critical patent/CN101267338A/en
Application granted granted Critical
Publication of CN101267338B publication Critical patent/CN101267338B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a high performance log and a behaviour audit system, main body of the system uses the three-layer structure composed by a data collecting layer, a data analyzing layer and a data storing and searching layer; the three layers uses TCP encryption transmission, each of them can be independently expanded, which enhances the performance of one layer. The invention is completely based on an embedded system, and performs pertinent optimization to software and hardware in terms of collection; the invention uses special software and hardware interface and arithmetic in storing and searching the log and behavioural data, keeps independence in designing the modules for the purpose of expansion, therefore the bottleneck will not happen to the single module.

Description

High-performance log and behavior auditing system
Technical field
The invention belongs to daily record and behavior audit field, specifically relate to a kind of high-performance log and behavior auditing system.
Background technology
Present daily record and behavior auditing system are in image data, when analyzing data and storage data, mainly be to utilize linux/unix or Windows server, adopt general relational database to come the constructing system platform, on this platform, further develop the application to data collection and analyzing stored.Yet for the environment of the such big data quantity of daily record and behavior, use such platform often gather, analyze and retrieval on can't satisfy the requirement of performance.Reason is: 1, for server, in order to satisfy needs professional and stability, moved a lot of services and finger daemon, but these services and finger daemon great majority there is no need for audit is used; 2, for general relational database, except data storage and query is provided, also need to provide the function of a lot of data correlations, therefore can't satisfy in the audit the single but huge quick-searching of data volume simultaneously of data relationship; 3, the application that often develops is gathered, is analyzed and storage can not be fully independent, and the generation bottleneck just is difficult to use on certain module in case cause having.
Summary of the invention
The present invention solves above-mentioned existing in prior technology technical problem, and a kind of high-performance log and behavior auditing system are provided.
Above-mentioned technical problem of the present invention is mainly solved by following technical proposals: system body of the present invention adopts data collection layer, data analysis layer and three layers of design of search data memory layer, adopt the TCP encrypted transmission between each layer, each layer all can be expanded separately, thereby improves the performance of individual layer.
Wherein data collection layer adopts PCI-E network interface industrial control board, configuration INTEL double-core CPU, DDRII internal memory and DOM dish.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.Capture program is operated in this layer and adopts zero duplication technology, at the initial stage of data acquisition, finishes the filtration of most of hash, reduces the pressure of analyzing and copying.The data format that collects turns to the unified structure body, by the TCP encrypted transmission to the data analysis layer.
The data analysis layer adopts server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.Data analysis module is operated in the data that also receive in this layer from acquisition layer, adopts tree type rule base to analyze every data, takes corresponding action, then data communication device is crossed the TCP encrypted transmission to the search data memory layer.
The search data memory layer adopts server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.The search data memory layer is operated in this system, receives the data from the data analysis layer, and the data that receive at first are stored among the ramdisk, after a certain amount of, writes hard disk in batches.Writing fashionablely, at first transferring structure to binary stream, then with sequence number unique in the structure and binary stream one by one correspondence write raw data file.Write after the hard disk, in order to retrieve efficiently, for each field of unified structure body is set up index file.Structure is divided into numeric type and two kinds of data of character string type, often for these two kinds of data, not only computer processing mode difference, and the mode that need retrieve is also different (for example, numeric type be generally greater than, less than with equal, and character string type is for comprising or not comprising).Therefore, according to the type difference, the index file of setting up is also different when setting up index file.The numeric type index file is set up by binary mode, comprises two kinds of information in the file: the one, and the numerical value of corresponding field in the structure, the 2nd, corresponding structure index number (for every data, sequence number is unique).When retrieval, by the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data.The community string index community file is calculating character string hash value at first, comprise three kinds of information in the file: the one, the original character string of corresponding field in the structure, the 2nd, the hash value of character string correspondence, the 3rd, corresponding structure index number (for every data, sequence number is unique).When retrieval, by the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data.
The design of high-performance log of the present invention and behavior auditing system is fully based on embedded system, and on gathering, software and hardware is carried out specific aim optimization, at daily record and behaviour pattern data storage and special-purpose interface between software and hardware and the algorithm of retrieval employing, design in module keeps independence, be convenient to expansion, can not cause bottleneck in single module.
Description of drawings
Fig. 1 is a kind of structural representation of the present invention.
Embodiment
Below by embodiment, and in conjunction with the accompanying drawings, technical scheme of the present invention is described in further detail.
Embodiment: referring to Fig. 1, system body of the present invention adopts data collection layer, data analysis layer and three layers of design of search data memory layer, adopts the TCP encrypted transmission between each layer, and each layer all can be expanded separately, thereby improves the performance of individual layer.
Wherein data collection layer adopts PCI-E network interface industrial control board, configuration INTEL double-core CPU, DDRII internal memory and DOM dish.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.Capture program is operated in this layer and adopts zero duplication technology, at the initial stage of data acquisition, finishes the filtration of most of hash, reduces the pressure of analyzing and copying.The data format that collects turns to the unified structure body, by the TCP encrypted transmission to the data analysis layer.
The data analysis layer adopts server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.Data analysis module is operated in the data that also receive in this layer from acquisition layer, adopts tree type rule base to analyze every data, takes corresponding action, then data communication device is crossed the TCP encrypted transmission to the search data memory layer.
The search data memory layer adopts server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.The search data memory layer is operated in this system, receives the data from the data analysis layer, and the data that receive at first are stored among the ramdisk, after a certain amount of, writes hard disk in batches.Writing fashionablely, at first transferring structure to binary stream, then with sequence number unique in the structure and binary stream one by one correspondence write raw data file.Write after the hard disk, in order to retrieve efficiently, for each field of unified structure body is set up index file.Structure is divided into numeric type and two kinds of data of character string type, often for these two kinds of data, not only computer processing mode difference, and the mode that need retrieve is also different (for example, numeric type be generally greater than, less than with equal, and character string type is for comprising or not comprising).Therefore, according to the type difference, the index file of setting up is also different when setting up index file.The numeric type index file is set up by binary mode, comprises two kinds of information in the file: the one, and the numerical value of corresponding field in the structure, the 2nd, corresponding structure index number (for every data, sequence number is unique).When retrieval, by the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data.The community string index community file is calculating character string hash value at first, comprise three kinds of information in the file: the one, the original character string of corresponding field in the structure, the 2nd, the hash value of character string correspondence, the 3rd, corresponding structure index number (for every data, sequence number is unique).When retrieval, by the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data.
In this a series of process, performance consumption is maximum also to be the place the highest to performance requirement simultaneously, just is storage and retrieves this step.In storage and retrieval, hard disk can produce frequent read-write, and in order to improve this performance, system has used software to cooperate the design of hardware.At first, guarantee to write in batches disk on the software as far as possible; Then, when writing in batches, cut apart according to DMA write-in block size and to write data; At last, when reading writing harddisk, reduce disk tracking number of times with rational algorithm as far as possible.Therefore owing to when these Design of Read-Writes, with the close relation of hardware and operating system nucleus, can only adopt the method that customizes kernel and hardware to ensure efficient.
At last, should be pointed out that above embodiment only is the more representational example of the present invention.Obviously, technical scheme of the present invention is not limited to the foregoing description, and many distortion can also be arranged.All distortion that those of ordinary skill in the art can directly derive or associate from content disclosed by the invention all should be thought protection scope of the present invention.

Claims (1)

1. high-performance log and behavior auditing system, it is characterized in that described system body adopts data collection layer, data analysis layer and three layers of design of search data memory layer, adopt the TCP encrypted transmission between each layer, each layer all can be expanded separately, thereby improves the performance of individual layer; Wherein
Data collection layer: adopt PCI-E network interface industrial control board, configuration INTEL double-core CPU, DDRII internal memory and DOM dish, cutting linux kernel, only stay necessary driving and module, make the startup mirror image, and system start-up is operated in later in the internal memory fully, to accelerate system running speed; Capture program is operated in this layer and adopts zero duplication technology, at the initial stage of data acquisition, finish the filtration of most of hash, reduce the pressure of analyzing and copying, the data format that collects turns to the unified structure body, by the TCP encrypted transmission to the data analysis layer;
Data analysis layer: adopt server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5, cutting linux kernel, only stay necessary driving and module, make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed; Data analysis module is operated in the data that also receive in this layer from acquisition layer, adopts tree type rule base to analyze every data, takes corresponding action, then data communication device is crossed the TCP encrypted transmission to the search data memory layer;
Search data memory layer: adopt server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5, cutting linux kernel, only stay necessary driving and module, make the startup mirror image, and system start-up is operated in later in the internal memory fully, to accelerate system running speed; The search data memory layer is operated in this system and receives the data from the data analysis layer, and the data that receive at first are stored among the ramdisk, after a certain amount of, writes hard disk in batches; Writing fashionablely, at first transferring structure to binary stream, then with sequence number unique in the structure and binary stream one by one correspondence write raw data file;
In the described search data memory layer, to receive from the data of data analysis layer and write in batches after the hard disk, in order to retrieve efficiently, for each field of unified structure body is set up index file; Structure is divided into numeric type and two kinds of data of character string type, the numeric type index file is set up by binary mode, numerical value and corresponding two kinds of information of structure index number of comprising corresponding field in the structure in the file, when retrieval, pass through the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data; The community string index community file is calculating character string hash value at first, corresponding hash value and the corresponding three kinds of information of structure index number of original character string, character string that comprise corresponding field in the structure in the file, when retrieval, pass through the retrieval to the manipulative indexing file, qualified sequence number set can be retrieved, data can be in raw data file, obtained by this set.
CN2008100604862A 2008-04-23 2008-04-23 High-performance log and behavior auditing system Active CN101267338B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008100604862A CN101267338B (en) 2008-04-23 2008-04-23 High-performance log and behavior auditing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008100604862A CN101267338B (en) 2008-04-23 2008-04-23 High-performance log and behavior auditing system

Publications (2)

Publication Number Publication Date
CN101267338A CN101267338A (en) 2008-09-17
CN101267338B true CN101267338B (en) 2010-10-13

Family

ID=39989487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008100604862A Active CN101267338B (en) 2008-04-23 2008-04-23 High-performance log and behavior auditing system

Country Status (1)

Country Link
CN (1) CN101267338B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101465765B (en) * 2008-12-31 2011-04-13 东信和平智能卡股份有限公司 Log system and use method thereof
CN101989916A (en) * 2009-08-04 2011-03-23 西安交大捷普网络科技有限公司 Separating multi-stage buffer network content filtering system and method
CN102385549A (en) * 2010-09-02 2012-03-21 北京无限立通通讯技术有限责任公司 Log processing system, log processing method and log storage sub-system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1633080A (en) * 2003-12-24 2005-06-29 华为技术有限公司 Method for implementing log in network management system
CN1642097A (en) * 2004-01-02 2005-07-20 联想(北京)有限公司 Journal accounting method and system
CN1932812A (en) * 2005-09-16 2007-03-21 腾讯科技(深圳)有限公司 Method and apparatus for holding journal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1633080A (en) * 2003-12-24 2005-06-29 华为技术有限公司 Method for implementing log in network management system
CN1642097A (en) * 2004-01-02 2005-07-20 联想(北京)有限公司 Journal accounting method and system
CN1932812A (en) * 2005-09-16 2007-03-21 腾讯科技(深圳)有限公司 Method and apparatus for holding journal

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
黄艺海.日志安全审计系统的设计与实现.中国优秀博硕士学位论文全文数据库(硕士)信息科技辑 2006年第05期.2006,(2006年第05期),第19-54页.
黄艺海.日志安全审计系统的设计与实现.中国优秀博硕士学位论文全文数据库(硕士)信息科技辑 2006年第05期.2006,(2006年第05期),第19-54页. *

Also Published As

Publication number Publication date
CN101267338A (en) 2008-09-17

Similar Documents

Publication Publication Date Title
US11093466B2 (en) Incremental out-of-place updates for index structures
US10496621B2 (en) Columnar storage of a database index
CN103955530B (en) Data reconstruction and optimization method of on-line repeating data deletion system
CN103377278B (en) The method and system of the table boundary detection in the data block that identification is to be compressed
CN107423422B (en) Spatial data distributed storage and search method and system based on grid
CN103116661B (en) A kind of data processing method of database
CN101777017B (en) Rapid recovery method of continuous data protection system
US20140280375A1 (en) Systems and methods for implementing distributed databases using many-core processors
US20070124277A1 (en) Index and Method for Extending and Querying Index
CN102915365A (en) Hadoop-based construction method for distributed search engine
EP3248115A1 (en) Application-centric object storage
CN112559481A (en) Data storage method and device based on distributed system and relational database
CN102272751B (en) Data integrity in a database environment through background synchronization
CN102779138B (en) The hard disk access method of real time data
Yang et al. F1 Lightning: HTAP as a Service
US20230359633A1 (en) Processing variable-length fields via formatted record data
CN115552390A (en) Server-free data lake indexing subsystem and application programming interface
CN101267338B (en) High-performance log and behavior auditing system
CN110019017B (en) High-energy physical file storage method based on access characteristics
CN103841168A (en) Data copy updating method and metadata server
CN103064847A (en) Indexing equipment, indexing method, search device, search method and search system
CN104408097A (en) Hybrid indexing method and system based on character field hot update
CN111159117A (en) Low-overhead file operation log acquisition method
US11899625B2 (en) Systems and methods for replication time estimation in a data deduplication system
JP2023551641A (en) List-based data storage for data retrieval

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant