CN101267338B - High-performance log and behavior auditing system - Google Patents
High-performance log and behavior auditing system Download PDFInfo
- Publication number
- CN101267338B CN101267338B CN2008100604862A CN200810060486A CN101267338B CN 101267338 B CN101267338 B CN 101267338B CN 2008100604862 A CN2008100604862 A CN 2008100604862A CN 200810060486 A CN200810060486 A CN 200810060486A CN 101267338 B CN101267338 B CN 101267338B
- Authority
- CN
- China
- Prior art keywords
- data
- layer
- file
- retrieval
- operated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention discloses a high performance log and a behaviour audit system, main body of the system uses the three-layer structure composed by a data collecting layer, a data analyzing layer and a data storing and searching layer; the three layers uses TCP encryption transmission, each of them can be independently expanded, which enhances the performance of one layer. The invention is completely based on an embedded system, and performs pertinent optimization to software and hardware in terms of collection; the invention uses special software and hardware interface and arithmetic in storing and searching the log and behavioural data, keeps independence in designing the modules for the purpose of expansion, therefore the bottleneck will not happen to the single module.
Description
Technical field
The invention belongs to daily record and behavior audit field, specifically relate to a kind of high-performance log and behavior auditing system.
Background technology
Present daily record and behavior auditing system are in image data, when analyzing data and storage data, mainly be to utilize linux/unix or Windows server, adopt general relational database to come the constructing system platform, on this platform, further develop the application to data collection and analyzing stored.Yet for the environment of the such big data quantity of daily record and behavior, use such platform often gather, analyze and retrieval on can't satisfy the requirement of performance.Reason is: 1, for server, in order to satisfy needs professional and stability, moved a lot of services and finger daemon, but these services and finger daemon great majority there is no need for audit is used; 2, for general relational database, except data storage and query is provided, also need to provide the function of a lot of data correlations, therefore can't satisfy in the audit the single but huge quick-searching of data volume simultaneously of data relationship; 3, the application that often develops is gathered, is analyzed and storage can not be fully independent, and the generation bottleneck just is difficult to use on certain module in case cause having.
Summary of the invention
The present invention solves above-mentioned existing in prior technology technical problem, and a kind of high-performance log and behavior auditing system are provided.
Above-mentioned technical problem of the present invention is mainly solved by following technical proposals: system body of the present invention adopts data collection layer, data analysis layer and three layers of design of search data memory layer, adopt the TCP encrypted transmission between each layer, each layer all can be expanded separately, thereby improves the performance of individual layer.
Wherein data collection layer adopts PCI-E network interface industrial control board, configuration INTEL double-core CPU, DDRII internal memory and DOM dish.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.Capture program is operated in this layer and adopts zero duplication technology, at the initial stage of data acquisition, finishes the filtration of most of hash, reduces the pressure of analyzing and copying.The data format that collects turns to the unified structure body, by the TCP encrypted transmission to the data analysis layer.
The data analysis layer adopts server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.Data analysis module is operated in the data that also receive in this layer from acquisition layer, adopts tree type rule base to analyze every data, takes corresponding action, then data communication device is crossed the TCP encrypted transmission to the search data memory layer.
The search data memory layer adopts server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.The search data memory layer is operated in this system, receives the data from the data analysis layer, and the data that receive at first are stored among the ramdisk, after a certain amount of, writes hard disk in batches.Writing fashionablely, at first transferring structure to binary stream, then with sequence number unique in the structure and binary stream one by one correspondence write raw data file.Write after the hard disk, in order to retrieve efficiently, for each field of unified structure body is set up index file.Structure is divided into numeric type and two kinds of data of character string type, often for these two kinds of data, not only computer processing mode difference, and the mode that need retrieve is also different (for example, numeric type be generally greater than, less than with equal, and character string type is for comprising or not comprising).Therefore, according to the type difference, the index file of setting up is also different when setting up index file.The numeric type index file is set up by binary mode, comprises two kinds of information in the file: the one, and the numerical value of corresponding field in the structure, the 2nd, corresponding structure index number (for every data, sequence number is unique).When retrieval, by the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data.The community string index community file is calculating character string hash value at first, comprise three kinds of information in the file: the one, the original character string of corresponding field in the structure, the 2nd, the hash value of character string correspondence, the 3rd, corresponding structure index number (for every data, sequence number is unique).When retrieval, by the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data.
The design of high-performance log of the present invention and behavior auditing system is fully based on embedded system, and on gathering, software and hardware is carried out specific aim optimization, at daily record and behaviour pattern data storage and special-purpose interface between software and hardware and the algorithm of retrieval employing, design in module keeps independence, be convenient to expansion, can not cause bottleneck in single module.
Description of drawings
Fig. 1 is a kind of structural representation of the present invention.
Embodiment
Below by embodiment, and in conjunction with the accompanying drawings, technical scheme of the present invention is described in further detail.
Embodiment: referring to Fig. 1, system body of the present invention adopts data collection layer, data analysis layer and three layers of design of search data memory layer, adopts the TCP encrypted transmission between each layer, and each layer all can be expanded separately, thereby improves the performance of individual layer.
Wherein data collection layer adopts PCI-E network interface industrial control board, configuration INTEL double-core CPU, DDRII internal memory and DOM dish.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.Capture program is operated in this layer and adopts zero duplication technology, at the initial stage of data acquisition, finishes the filtration of most of hash, reduces the pressure of analyzing and copying.The data format that collects turns to the unified structure body, by the TCP encrypted transmission to the data analysis layer.
The data analysis layer adopts server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.Data analysis module is operated in the data that also receive in this layer from acquisition layer, adopts tree type rule base to analyze every data, takes corresponding action, then data communication device is crossed the TCP encrypted transmission to the search data memory layer.
The search data memory layer adopts server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5.The cutting linux kernel only stays necessary driving and module.Make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed.The search data memory layer is operated in this system, receives the data from the data analysis layer, and the data that receive at first are stored among the ramdisk, after a certain amount of, writes hard disk in batches.Writing fashionablely, at first transferring structure to binary stream, then with sequence number unique in the structure and binary stream one by one correspondence write raw data file.Write after the hard disk, in order to retrieve efficiently, for each field of unified structure body is set up index file.Structure is divided into numeric type and two kinds of data of character string type, often for these two kinds of data, not only computer processing mode difference, and the mode that need retrieve is also different (for example, numeric type be generally greater than, less than with equal, and character string type is for comprising or not comprising).Therefore, according to the type difference, the index file of setting up is also different when setting up index file.The numeric type index file is set up by binary mode, comprises two kinds of information in the file: the one, and the numerical value of corresponding field in the structure, the 2nd, corresponding structure index number (for every data, sequence number is unique).When retrieval, by the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data.The community string index community file is calculating character string hash value at first, comprise three kinds of information in the file: the one, the original character string of corresponding field in the structure, the 2nd, the hash value of character string correspondence, the 3rd, corresponding structure index number (for every data, sequence number is unique).When retrieval, by the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data.
In this a series of process, performance consumption is maximum also to be the place the highest to performance requirement simultaneously, just is storage and retrieves this step.In storage and retrieval, hard disk can produce frequent read-write, and in order to improve this performance, system has used software to cooperate the design of hardware.At first, guarantee to write in batches disk on the software as far as possible; Then, when writing in batches, cut apart according to DMA write-in block size and to write data; At last, when reading writing harddisk, reduce disk tracking number of times with rational algorithm as far as possible.Therefore owing to when these Design of Read-Writes, with the close relation of hardware and operating system nucleus, can only adopt the method that customizes kernel and hardware to ensure efficient.
At last, should be pointed out that above embodiment only is the more representational example of the present invention.Obviously, technical scheme of the present invention is not limited to the foregoing description, and many distortion can also be arranged.All distortion that those of ordinary skill in the art can directly derive or associate from content disclosed by the invention all should be thought protection scope of the present invention.
Claims (1)
1. high-performance log and behavior auditing system, it is characterized in that described system body adopts data collection layer, data analysis layer and three layers of design of search data memory layer, adopt the TCP encrypted transmission between each layer, each layer all can be expanded separately, thereby improves the performance of individual layer; Wherein
Data collection layer: adopt PCI-E network interface industrial control board, configuration INTEL double-core CPU, DDRII internal memory and DOM dish, cutting linux kernel, only stay necessary driving and module, make the startup mirror image, and system start-up is operated in later in the internal memory fully, to accelerate system running speed; Capture program is operated in this layer and adopts zero duplication technology, at the initial stage of data acquisition, finish the filtration of most of hash, reduce the pressure of analyzing and copying, the data format that collects turns to the unified structure body, by the TCP encrypted transmission to the data analysis layer;
Data analysis layer: adopt server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5, cutting linux kernel, only stay necessary driving and module, make the startup mirror image, and operate in fully in the internal memory after system is started, to accelerate system running speed; Data analysis module is operated in the data that also receive in this layer from acquisition layer, adopts tree type rule base to analyze every data, takes corresponding action, then data communication device is crossed the TCP encrypted transmission to the search data memory layer;
Search data memory layer: adopt server master board, configuration 64 double-core CPU of AMD ATHLON and SATA hard disk RAID5, cutting linux kernel, only stay necessary driving and module, make the startup mirror image, and system start-up is operated in later in the internal memory fully, to accelerate system running speed; The search data memory layer is operated in this system and receives the data from the data analysis layer, and the data that receive at first are stored among the ramdisk, after a certain amount of, writes hard disk in batches; Writing fashionablely, at first transferring structure to binary stream, then with sequence number unique in the structure and binary stream one by one correspondence write raw data file;
In the described search data memory layer, to receive from the data of data analysis layer and write in batches after the hard disk, in order to retrieve efficiently, for each field of unified structure body is set up index file; Structure is divided into numeric type and two kinds of data of character string type, the numeric type index file is set up by binary mode, numerical value and corresponding two kinds of information of structure index number of comprising corresponding field in the structure in the file, when retrieval, pass through the retrieval to the manipulative indexing file, can retrieve qualified sequence number set, by this set, can in raw data file, obtain data; The community string index community file is calculating character string hash value at first, corresponding hash value and the corresponding three kinds of information of structure index number of original character string, character string that comprise corresponding field in the structure in the file, when retrieval, pass through the retrieval to the manipulative indexing file, qualified sequence number set can be retrieved, data can be in raw data file, obtained by this set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008100604862A CN101267338B (en) | 2008-04-23 | 2008-04-23 | High-performance log and behavior auditing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008100604862A CN101267338B (en) | 2008-04-23 | 2008-04-23 | High-performance log and behavior auditing system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101267338A CN101267338A (en) | 2008-09-17 |
CN101267338B true CN101267338B (en) | 2010-10-13 |
Family
ID=39989487
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2008100604862A Active CN101267338B (en) | 2008-04-23 | 2008-04-23 | High-performance log and behavior auditing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101267338B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101465765B (en) * | 2008-12-31 | 2011-04-13 | 东信和平智能卡股份有限公司 | Log system and use method thereof |
CN101989916A (en) * | 2009-08-04 | 2011-03-23 | 西安交大捷普网络科技有限公司 | Separating multi-stage buffer network content filtering system and method |
CN102385549A (en) * | 2010-09-02 | 2012-03-21 | 北京无限立通通讯技术有限责任公司 | Log processing system, log processing method and log storage sub-system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1633080A (en) * | 2003-12-24 | 2005-06-29 | 华为技术有限公司 | Method for implementing log in network management system |
CN1642097A (en) * | 2004-01-02 | 2005-07-20 | 联想(北京)有限公司 | Journal accounting method and system |
CN1932812A (en) * | 2005-09-16 | 2007-03-21 | 腾讯科技(深圳)有限公司 | Method and apparatus for holding journal |
-
2008
- 2008-04-23 CN CN2008100604862A patent/CN101267338B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1633080A (en) * | 2003-12-24 | 2005-06-29 | 华为技术有限公司 | Method for implementing log in network management system |
CN1642097A (en) * | 2004-01-02 | 2005-07-20 | 联想(北京)有限公司 | Journal accounting method and system |
CN1932812A (en) * | 2005-09-16 | 2007-03-21 | 腾讯科技(深圳)有限公司 | Method and apparatus for holding journal |
Non-Patent Citations (2)
Title |
---|
黄艺海.日志安全审计系统的设计与实现.中国优秀博硕士学位论文全文数据库(硕士)信息科技辑 2006年第05期.2006,(2006年第05期),第19-54页. |
黄艺海.日志安全审计系统的设计与实现.中国优秀博硕士学位论文全文数据库(硕士)信息科技辑 2006年第05期.2006,(2006年第05期),第19-54页. * |
Also Published As
Publication number | Publication date |
---|---|
CN101267338A (en) | 2008-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11093466B2 (en) | Incremental out-of-place updates for index structures | |
US10496621B2 (en) | Columnar storage of a database index | |
CN103955530B (en) | Data reconstruction and optimization method of on-line repeating data deletion system | |
CN103377278B (en) | The method and system of the table boundary detection in the data block that identification is to be compressed | |
CN107423422B (en) | Spatial data distributed storage and search method and system based on grid | |
CN103116661B (en) | A kind of data processing method of database | |
CN101777017B (en) | Rapid recovery method of continuous data protection system | |
US20140280375A1 (en) | Systems and methods for implementing distributed databases using many-core processors | |
US20070124277A1 (en) | Index and Method for Extending and Querying Index | |
CN102915365A (en) | Hadoop-based construction method for distributed search engine | |
EP3248115A1 (en) | Application-centric object storage | |
CN112559481A (en) | Data storage method and device based on distributed system and relational database | |
CN102272751B (en) | Data integrity in a database environment through background synchronization | |
CN102779138B (en) | The hard disk access method of real time data | |
Yang et al. | F1 Lightning: HTAP as a Service | |
US20230359633A1 (en) | Processing variable-length fields via formatted record data | |
CN115552390A (en) | Server-free data lake indexing subsystem and application programming interface | |
CN101267338B (en) | High-performance log and behavior auditing system | |
CN110019017B (en) | High-energy physical file storage method based on access characteristics | |
CN103841168A (en) | Data copy updating method and metadata server | |
CN103064847A (en) | Indexing equipment, indexing method, search device, search method and search system | |
CN104408097A (en) | Hybrid indexing method and system based on character field hot update | |
CN111159117A (en) | Low-overhead file operation log acquisition method | |
US11899625B2 (en) | Systems and methods for replication time estimation in a data deduplication system | |
JP2023551641A (en) | List-based data storage for data retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |