Summary of the invention
In view of this, the object of the invention is to provide a kind of log information management method and system, and effectively the latency issue of settlement server to the analysis operation of daily record the time makes operator understand the operation situation of website in the very first time.
To achieve these goals, the invention provides following technical scheme:
A kind of web log file information management system comprises: log information management unit, log information extraction formatting unit, uniform data centralized unit and web log file information provide the unit, wherein:
Described log information management unit is used for: the form that disposes the log information of same server end is default form, and, regularly intercept log information and preservation, therefrom select web log file information and be stored in the daily record of setting up in advance and reclaim in the server;
Described log information extracts formatting unit and is used for: the web log file information that server is stored is reclaimed in the format daily record;
Described uniform data centralized unit is used for: will classify through formative web log file information, and form a plurality of Data Marts and storage;
Described web log file information provides the unit, and being used for provides the corresponding website log information receiving when checking the web log file information request.
Preferably, described daily record recovery server comprises a level logs recovery point and two level logs recovery points;
A described level logs recovery point is used for storing the good web log file information of bandwidth situation that the all-network log information is selected;
Described two level logs recovery points are used to store the web log file information except that the web log file information of described level logs recovery point storage.
Preferably, the storage mode of a described level logs recovery point and/or two level logs recovery points is RAID6 and divides the virtual volume mode.
Preferably, described log information extraction formatting unit comprises:
Extracting unit is used to extract network log information;
Converting unit is used for the network log information translation that described extracting unit extracts is become the network log information of predetermined format;
Load units is used for the network log information of described predetermined format is stored.
Preferably, described log information extracts formatting unit and also comprises trigger, is used to produce the triggering signal of control described extracting unit, converting unit and load units work.
Preferably, described trigger comprises line trigger and table trigger.
Preferably, described log information extracts the network log information that formatting unit handles and comprises the basic data layer data, granularity amplification layer data and Data Mart layer data.
A kind of web log file approaches to IM comprises:
Dispose same server log information format;
Regularly intercept log information and preservation;
From the log information of described intercepting and capturing, choose web log file information and be stored in the daily record of setting up in advance and reclaim in the server;
The web log file information format of storing in the server is reclaimed in described daily record handle, convert the network log information that meets predetermined format to;
To classify through the web log file information of described predetermined format forms a plurality of Data Marts, and storage, and being convenient to provides the corresponding website log information to the operator receiving when checking the web log file information request.
Preferably, the network log information stores being reclaimed server in the daily record of setting up in advance comprises:
Default one-level recovery point and secondary recovery point are in order to recovery that log information is classified;
The good web log file information of bandwidth situation in all data of storage is chosen in a described level logs recovery point;
The data of described two level logs recovery points storage except that the data of described level logs recovery point storage.
Preferably, before being formatd processing, described web log file information also comprises:
Network log information is divided into basic data layer data, granularity amplification layer data and Data Mart layer data; Above-mentioned every layer data is carried out layering again, in every layer, all data are connected with ordered series of numbers, be convenient to processing data.
From technique scheme as can be seen, the present invention is by unifying configuration to the daily record in the server, and server is reclaimed in default daily record, reclaim regularly store website log information of server by daily record, the log information that reclaims in the server is formatd processing, and the log information that will handle unifies centralized stores, obtains at random, analyzes for website maintenance person's (or being called website operators).That is to say, when website maintenance person need understand the current operation situation in website, can directly obtain the web log file information after handling through format and analyze, thereby reduce processing delay, and then make website maintenance person can in time understand the current operation situation in website.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
The invention discloses a kind of log information management system as shown in Figure 1, comprising: log information management unit 1, log information extracts formatting unit 2, uniform data centralized unit 3 and web log file information that unit 4 is provided.
Described log information management unit 1 is according to the classification of Internet Server, at the difference of type of server (the web server of apache squid caching server ftp document management server Streaming Media source server the broadcasting server of Streaming Media) configuration unified journal format separately, the form in the time of guaranteeing the generation of daily record like this is monolithic in similar server.The standard W3C form of the Internet is adopted in the configuration of daily record as far as possible, guarantee that reduction that the aftertreatment of daily record can maximum possible is extracted and the ratio of conversion, and depositing of daily record will be placed on the bigger memory space in space.Because of during the operation of website is in daily record constantly to upgrade, web log file is carried out timing intercepting, thereby assurance provides the operation situation of each period of website for operator.
Server is when accessed, the generation of daily record is unavoidable will fixing position, and be unique position, this has caused certain trouble with regard to the recovery of giving daily record, because need and uninterruptedly externally provide the server of service to look for a mechanism that is independent of each other in continual daily record recovery system, therefore we can not influence under the prerequisite of server in assurance, create an independently daily record recovery system and passage, physical distribution and two dimensions of network quality at server are looked for the place of a balance, place the server that daily record is reclaimed, and a plurality of such daily records recovery servers of the whole network have been formed complete daily record recovery server group, add us and wish that the preservation of daily record will accomplish redundancy, therefore we can be divided into whole network several big zones, between each zone and among all place the log server of master slave relation, promptly set up a level logs recovery point, choose backbone node (the bandwidth foot in all data centers, minimum relatively to each data center's time-delay), create two level logs recovery points, the memory space of each daily record recovery point guarantees to guarantee 50 times space of maximum quantum of output of collector journal day, and the technology that the storage of accomplishing each daily record recovery point is adopted RAID6 and divided virtual volume is guaranteed the fail safe of storing, each daily record recovery point needs a heat backup at different sites part, adopt the rsync technology synchronous in real time, guarantee that the fault of the single-point inaccessible of daily record recovery point can not have any impact to the preservation of daily record.
The extraction formatting unit 2 of data further comprises: extracting unit, and converting unit and load units extract the various daily records that collect, and conversion and loading are the ETL technology.Extracting unit is used to extract network log information; Converting unit is used for the network log information translation that described extracting unit extracts is become the network log information of predetermined format; Load units is used for the network log information of described predetermined format is stored.
We are divided into some levels with the process of whole derivation, and according to succession, leap property and irreversibility can stipulate that all Job can only call downwards, allow anything but upwards to call, and call but allow to stride layer downwards.Like this in this ETL instrument of DataStage, each Job is according to its residing stage in the ETL process, can put at all levelsly respectively, utilize the relation between level to retrain the relation of Job, thereby the call relation of guaranteeing each Job is able to clearly.
Therefore ETL can be handled data and be divided into three levels: the basic data layer, the granularity amplification layer, several little levels have been segmented again in every layer on the Data Mart layer (just can be the Job that granularity is amplified after promptly the Job in the basic data layer finishes, after granularity is amplified, just can be the Job of Data Mart).In each layer, all Job are coupled together with a sequence all, like this when every day, operation was distributed in all Job of each layer, only need be according to order from low to high, sequence in moving every layer, just can guarantee that the data that each Job moves out all are consistent, thereby avoid causing the inconsistent situation of data to occur because Job calls the order confusion.
Owing to processing in the ETL implementation procedure is the log information of magnanimity, and relate to a plurality of systems, these systems often all are core systems, in the use of technology, need to consider the influence to the Performance And Reliability of origin system to be dropped to minimum, therefore in implementation procedure, can specifically adopt the technology of the following aspects:
Trigger is a kind of storing process of specific type, mainly triggers being performed by incident.In the extraction process of daily record, create different trigger mechanisms according to different types of journal format, and realize being converted to the log information of consolidation formization.
Trigger can have line trigger and show two kinds on trigger:
Line trigger: this class trigger only pins the row of triggering when triggering, and other row still is manipulable in the table, but this type of trigger can't be changed table itself when triggering.
The table trigger: this class trigger can pin table when triggering, thus this moment his-and-hers watches except that all operations of retrieval with locked, but this type of trigger can't obtain upgrade before or the data after the renewal.
Therefore we just cut apart intercepting with daily record at the initial stage of the generation of daily record at meeting, guarantee can not run into the renewal of log information, and it to be just passable only need to guarantee that journal file can in time be sent to the ETL place in the ETL implementation procedure of back.
Simultaneously in order to ensure the concurrent processing of ETL, we have selected line trigger, so just allow a plurality of ETL handle the possibility that each file become, thus the highest to the efficient raising of system.The fault that has also reduced separate unit ETL may to the influence of whole ETL implementation procedure.
In order to improve the efficient of derivation, in implementation procedure, adopted the increment export plan, derive the starting time that to set up increment in order to carry out increment, this starting time has been arranged, just there has been the starting point of seeking new change record in system, need only later on derives this time point record of change afterwards at every turn, and it is just passable to upgrade this time point after confirming successfully derivation.
In order to guarantee the quality of data, need to handle automatically, the principle of handling is automatically: behind the time point of increment success, delete this record, increment is derived new record again again.
Described uniform data centralized unit 3 will be classified through formative web log file information, form a plurality of Data Marts and storage.
ETL be the foundation of Data Mart created one complete, can reflect the platform consistent of historical variations with structure, being established as according to the Data Mart of user's demand exploitation subject-oriented of such data Stores Stressed Platform laid a good foundation.
The value of the analytical statement of all daily records is that the design by the user embodies, the user is professional expert, and the expert should initiatively stand in the angle that system realizes, help the customer analysis of each inquiry form to go out the role of each data in form, design the suitable data structure then, so just formed Data Mart.
Usually the requirement source of the analytical statement of daily record is mainly from two aspects, one is industry specialists, one is to use the user with the query analysis form, no matter demand is from where, it all is the definition on the role of different dimensions to different pieces of information, and the combination of a sets of data has just been formed in these roles' definition, and according to the reasonable combination to these data, just formed the complete Data Mart of a cover, as long as guaranteeing any data can get access in original log, so just its implementation procedure by ETL can be focused on Data Mart, and the implementation procedure of ETL is relatively independent, this has just guaranteed the quick autgmentability of global analysis system.
Web log file information provides unit 4, and being used for provides the corresponding website log information receiving when checking the web log file information request.
When operator need safeguard or when managing, need retrieve for examination the log information of website the website, made adjustment in the website by analysis to log information.The order of daily record is checked by submission by operator, after web log file information provides the unit to receive order, transfers that stored log information offers operator in the uniform data centralized unit.
The invention also discloses a kind of in addition and the corresponding method of said system, may further comprise the steps:
Step 21 disposes same server log information format.
The data source of operator's analyzing web site operation situation comes from the access log of the various servers of the Internet, therefore the design of daily record and configuration are very crucial for concentrate extraction and conversion in the future, the various servers of the Internet are when design, therefore just considered daily record standard in the industry, allowed the possibility of unified work change of Log Source configuration of analytical system.
According to the classification of Internet Server, at the difference of type of server configuration unified journal format separately, the form in the time of guaranteeing the generation of daily record like this is monolithic in similar server.
Step 22 regularly intercepts log information, and the log information of intercepting is left in the memory space.
Because daily record is constantly upgraded, and is extremely important to the intercepting work of daily record, by website operation situation is at that time obtained in the real-time intercepting of daily record.
Step 23 is chosen web log file information and is stored in daily record from the log information of described intercepting and capturing and reclaims the server.
Server is when accessed, the generation of daily record is unavoidable will fixing position, and be unique position, this has caused certain trouble with regard to the recovery of giving daily record, because need and uninterruptedly externally provide the server of service to look for a mechanism that is independent of each other in continual daily record recovery system, therefore we can not influence under the prerequisite of server in assurance, create an independently daily record recovery system and passage, physical distribution and two dimensions of network quality at server are looked for the place of a balance, place the server that daily record is reclaimed, and a plurality of such daily records recovery servers of the whole network have been formed complete daily record recovery server group, add us and wish that the preservation of daily record will accomplish redundancy, therefore we can be divided into whole network several big zones, between each zone and among all place the log server of master slave relation, the daily record of avoiding Single Point of Faliure to cause is reclaimed and is interrupted, reclaim the storage security of server simultaneously in order to ensure the single-point daily record, we adopt the sata hard disk of latest generation and the last mode of dividing virtual volume of raid6 to set up storage architecture.
The daily record recovery system mainly reclaims server by each the regional daily record that distributes and center log store server is formed, but in order to guarantee the double counting of original log, we preserved 7 days in each regional daily record, and center log store server then is a permanent storage.
Step 24 reclaims the network log information that the web log file information translation of storing in the server becomes to meet predetermined format with described daily record.
In a single day daily record is sent to daily record and reclaims server, will face and how to split, extract, with the problem of changing and be loaded into data warehouse, because log analysis system requirements partial analysis content, as flow bandwidth, visitors etc. have ageing, therefore the efficient in this step is particularly important, therefore we are in this link, can the employing technology carry out multistage fractionation, all daily records are gone earlier heavily, go mistake, format, this link was carried out once in per 5 minutes, timestamp is stamped in the daily record that the center collects, handled then, simultaneously in the face of dissimilar daily records, we adopt different extractions and formative standard, through the fractionation of one-level, extract, we are with different types, different zones, different service-domain name classification, and then at the classification daily record carry out second decimation, according to different grain-size classification, the row format operation of going forward side by side is for the computing of Data Mart provides accurate as far as possible and formative data with the information of daily record.
Step 25, will classify through the web log file information of described predetermined format forms a plurality of Data Marts, and storage, and being convenient to provides the corresponding website log information to the operator receiving when checking the web log file information request.
It should be noted that those skilled in the art should readily understand, the above-mentioned website maintenance person who mentions, website operators, website operator, manager etc. are all identical concept, and it all needs web log file is handled, and concrete this paper repeats no more.
It can be seen from the above, the embodiment of the invention is regularly obtained web log file information and is being formatd the back centralized stores, make website maintenance person when needs are understood the current operation situation in website, can directly obtain through formative web log file information and analyzing, and need not to carry out format manipulation, thereby reduced processing delay.Thereby make website maintenance person can in time understand the current operation situation in website, make more effective operation strategy.
To the above-mentioned explanation of the disclosed embodiments, make this area professional technique user person can realize or use the present invention.Multiple modification to these embodiment will be conspicuous concerning the professional technique user person of this area, and defined herein General Principle can realize under the situation that does not break away from the spirit or scope of the present invention in other embodiments.Therefore, the present invention will can not be restricted to these embodiment shown in this article, but will meet and principle disclosed herein and features of novelty the wideest corresponding to scope.