CN203039704U - Web log storage system - Google Patents

Web log storage system Download PDF

Info

Publication number
CN203039704U
CN203039704U CN 201220389766 CN201220389766U CN203039704U CN 203039704 U CN203039704 U CN 203039704U CN 201220389766 CN201220389766 CN 201220389766 CN 201220389766 U CN201220389766 U CN 201220389766U CN 203039704 U CN203039704 U CN 203039704U
Authority
CN
China
Prior art keywords
web
log file
daily record
web log
necessary information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201220389766
Other languages
Chinese (zh)
Inventor
李晓亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING DINGZHEN TECHNOLOGY Co Ltd
Original Assignee
BEIJING DINGZHEN TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING DINGZHEN TECHNOLOGY Co Ltd filed Critical BEIJING DINGZHEN TECHNOLOGY Co Ltd
Priority to CN 201220389766 priority Critical patent/CN203039704U/en
Application granted granted Critical
Publication of CN203039704U publication Critical patent/CN203039704U/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)
  • Computer And Data Communications (AREA)

Abstract

The utility model provides a web log storage system based on bypass mirroring, thereby solving problems exist in the prior art. The system uses a method of bypass mirroring to obtain access data. Bypass mirroring is performed on data of a visited website, and original data packet information of user visiting the website is obtained. After behavior classification of access by a behavioral analysis module, various format web logs are recorded. The technical scheme would not cause any burden on a web server, and a log format has nothing to do with selection of the web server. In a conventional networking model, relevant WEB servers are accessed on a network switch, and WEB server entities complete storage of relevant web logs and other functions.

Description

A kind of web log file saved system
Technical field
The utility model relates to communication technical field, relates in particular to a kind of web log file saved system.
Background technology
Web log file is the file with the .log ending that record web server receives various raw informations such as the request of processing and run time error.Whom can understand by web log file and when use what instrument to visit which content of website, it is the most basic source of web analytics and website data storage.Because can become a necessary basis that guarantees the normal operation of web server by complete errorless preservation web log file.
In the prior art, web log file is by WEB server self record, and the WEB server is recorded in some information of this time visit on this locality or certain webserver with the form of text according to the journal format that sets in advance when visit produces.
But different WEB servers is generally only supported own specific journal format, and as the NCSA journal format of apache support and the W3C journal format of IIS support, most log analysis tool all provides the support to NCSA and at least a form of W3C.Other have some WEB servers such as nginx have oneself the acquiescence journal format, generally need manual configuration to become the NCSA form to make things convenient for the usage log analysis software.There is following problem in prior art generally:
1. access log is responsible for record by the web server, and the request that the web server not only needs to respond the visitor also needs the record access daily record, has increased the web load of server.The information that obtains visit each time all is to be carried out synchronously when handling request by the web server, influences the performance of web server.
2. the form of daily record is relevant with the web server of use, the range of choice of the web log file analysis tool that this has greatly limited.The web server that traditional web log file form is used restricts, and has selected certain server and has also just selected certain journal format, in other words in order to use certain journal format to have to select for use certain server.
3. daily record layoutprocedure very complicated, some web server even only can see through configuration file and just can finish the daily record configuration, this need have higher computer literacy to finish smoothly.The web server does not generally provide the screening function to the daily record that has generated in addition, can't carry out Screening Treatment to the daily record that has generated.
4. the daily record record does not possess intelligent, existing web log file is the simple entrained intrinsic information of record web message, do not possess any behavioural analysis ability, what attack or normally visit for the website using daily record, not have difference no matter be, generally all need the professional and technical personnel to analyze the behavior of presumed access, if the website is attacked, in a large amount of access logs, seek and attack clue just as looking for a needle in a haystack.
The utility model content
At the above-mentioned shortcoming of conventional web sites logging mode, the purpose of this utility model is to provide a kind of web log file saved system and method and apparatus based on the bypass mirror image, thereby solves the foregoing problems that exists in the prior art.The utility model adopts the mode of bypass mirror image to obtain visit data, data to access websites are carried out " bypass mirror image ", obtain the initial data package informatin of user's access websites, visit is carried out can being recorded as after behavior is classified the web log file of multiple form via the behavioural analysis module.The technical solution of the utility model can not cause any burden to the web server, and the selection of journal format and web server is irrelevant fully.The traditional group pessimistic concurrency control is exactly to insert relevant WEB server at the network switch, finishes relevant functions such as web log file preservation by the WEB server entity; And technology networking plan of the present utility model is that an equipment entity has been disposed in bypass on switch, finishes the function of preserving web log file and query web daily record by this equipment entity, and the WEB server entity only needs to finish the information answer function of website.
The disclosed technical scheme of the utility model is specific as follows:
A kind of web log file saved system comprises fire compartment wall, the network switch and web server, and the described network switch is the network switch that possesses mirror port, is connected with daily record on the described mirror port and preserves server; Described mirror port is used for obtaining by the traffic mirroring mode communication data of the PORT COM that is connected with described daily record preservation server.
Preferably, described daily record preservation server comprises that flow collection module, http protocol-analysis model, Request message analysis module, Response message analysis module, behavioural analysis module, daily record condition check module and web log file preservation module; Described flow collection module, described http protocol-analysis model, described Request message analysis module, described Response message analysis module, described behavioural analysis module, described daily record condition check that module and described web log file preservation sequence of modules are connected.
Preferably, described web log file saved system also comprises web log file screening module, and described web log file screening module is for according to the condition of request end appointment web log file being screened and The selection result being fed back to the described request end.
A kind of web log file saved system of using carries out the method that daily record is preserved, and may further comprise the steps:
S1 obtains the entire packet that described web server is received and sent by described mirror port;
S2 analyzes described packet, obtains http protocol data bag from described packet;
S3 analyzes the Request message data in the described http protocol data bag, obtains Request message necessary information;
S4 analyzes the Response message data in the described http protocol data bag, obtains Response message necessary information;
S5 analyzes described Request message necessary information and/or Response message necessary information, obtains visiting the behavior type information;
S6, with described Request message necessary information and/or Response message necessary information and/or visit behavior type and pre-conditioned contrast, if meet the described pre-conditioned then described Request message of buffer memory, and wait for and obtain the Response message corresponding with this Request message, after getting access to the Response message corresponding with described Request message, then will be mutually corresponding Request message and Response message form complete access process, and described complete access process be saved in database and/or the journal file according to default form form web log file.
Preferably, further comprising the steps of:
S7, the screening conditions that arrange according to the request end filter out qualified daily record record from described database and/or journal file, and should qualified daily record record and save as new file and feed back to the request end again.
Preferably, described pre-conditioned, described default form, described screening conditions are all by the web page setup.
Preferably, described Request message necessary information comprises visitor's IP address, the concrete domain name of visit, concrete URL, Refrence information, the UserAgent of visit and the Cookies that carries; Described Response message necessary information comprises response status sign indicating number, the content type that carries and message length.
Preferably,
S1 is specially, and obtains by described mirror port, obtains the message that all send to described web server and send from described web server, and described message is separated into the uplink and downlink flow; And/or
S2 is specially, and distinguishes by the content analysis to TCP load in the described uplink and downlink flow, acquires the http protocol massages; And/or
S3 is specially, and to the processing of decoding of the Request message in the described http protocol massages, isolates the Request necessary information, and with described Request necessary information buffering; And/or
S4 is specially, and to the processing of decoding of the Response message in the described http protocol massages, isolates the Response necessary information, and with described Response necessary information buffering; And/or
S5 is specially, and according to described Request message and the entrained information of described Response message visitor's visit behavior is analyzed, and determines the behavior type of described visit behavior; And/or
S6 is specially, with described Request necessary information and/or described Response necessary information and/or described visit behavior type and the comparison of default daily record condition, if meet described default daily record condition, then the Request packet buffer that includes described Request necessary information, and the wait Response message corresponding with this Request message, after getting access to the Response message corresponding with this Request message, Request necessary information in then will be the mutually corresponding Request message and the Response necessary information in the Response message merge forms a complete access process, be combined into a final web log file according to default journal format and journal entries again and write database and/or journal file in and set up the search index of this web log file.
A kind of web log file saved system of using carries out the device that daily record is preserved, and comprising:
The flow collection module is used for obtaining the entire packet that described web server is received and sent by described mirror port;
The http protocol-analysis model is used for analyzing described packet, obtains http protocol data bag from described packet;
Request message analysis module, the Request message data for analyzing described http protocol data bag obtains Request message necessary information;
Response message analysis module, the Response message data for analyzing described http protocol data bag obtains Response message necessary information;
The behavioural analysis module is used for analyzing described Request message necessary information and/or Response message necessary information, obtains visiting the behavior type information;
The daily record condition checks module, is used for described Request message necessary information and/or Response message necessary information and/or visits behavior type and pre-conditioned contrast, if meet described pre-conditioned then send into next treatment step;
Web log file preservation module is used for that complete access process is saved in database according to default form and/or journal file forms web log file.
Preferably, described device also comprises web log file screening module, and described web log file screening module is used for according to specified requirements web log file being screened.
The beneficial effects of the utility model are:
1. at record and when preserving web log file, to the website without any influence, need not to revise any configuration in website, need not to rewrite the webpage of website, can accomplish plug and play;
2. this programme can not damage the performance of web by placing the flow collection module on the bypass equipment to finish data acquisition, makes the web server can save resource and improves concurrent request amount and computational speed.
3. this programme has carried out intelligent classification by the behavioural analysis module to the visit behavior, and attack, reptile, normal visit etc. come into plain view.
4. the daily record record format of this programme and what web server of use use the apache server also can obtain the daily record of W3C form without any relation.
5. Log Filter module of the present utility model can directly meet the log content of user's request to user's output.
Description of drawings
Fig. 1 is the disclosed web log file saved system of the utility model structural representation;
Fig. 2 is the flow chart of steps that the disclosed application web log file of the utility model saved system carries out the method for daily record preservation;
Fig. 3 is the schematic block diagram that the disclosed application web log file of the utility model saved system carries out the device of daily record preservation.
Embodiment
Clearer for technical problem, technical scheme and beneficial effect that the utility model is solved, below in conjunction with accompanying drawing, the utility model is further elaborated.Should be appreciated that embodiment described herein only in order to explaining the utility model, and be not used in restriction the utility model.
As shown in Figure 1, the utility model discloses a kind of web log file saved system, comprise fire compartment wall, the network switch and web server, the described network switch is the network switch that possesses mirror port, is connected with daily record on the described mirror port and preserves server; Described mirror port is used for obtaining by the traffic mirroring mode communication data of the PORT COM that is connected with described daily record preservation server.Described daily record is preserved server and is comprised that flow collection module, http protocol-analysis model, Request message analysis module, Response message analysis module, behavioural analysis module, daily record condition check module and web log file preservation module; Described flow collection module, described http protocol-analysis model, described Request message analysis module, described Response message analysis module, described behavioural analysis module, described daily record condition check that module and described web log file preservation sequence of modules are connected.Described web log file saved system also comprises web log file screening module, and described web log file screening module is for according to the condition of request end appointment web log file being screened and The selection result being fed back to the described request end.
As shown in Figure 2, the utility model discloses a kind of web log file saved system of using and carry out the method that daily record is preserved, may further comprise the steps:
S1 obtains the entire packet that described web server is received and sent by described mirror port; Be specially, obtain by described mirror port, obtain the message that all send to described web server and send from described web server, and described message is separated into the uplink and downlink flow;
S2 analyzes described packet, obtains http protocol data bag from described packet; Be specially, accurately distinguish the message that belongs to the http agreement by the content analysis to TCP load in the described uplink and downlink flow, acquire the http protocol massages; Because the http agreement is initiated by the Request message, therefore the http protocol analysis system is at first isolated the Request message, and then find replying at this Request message, respectively Request message and Response message are delivered to Request analytical system and Response analytical system, and form the corresponding relation of Request message and Response message.
S3 analyzes the Request message data in the described http protocol data bag, obtains Request message necessary information; Be specially, to the processing of decoding of the Request message in the described http protocol massages, isolate the Request necessary information, and with described Request necessary information buffering; Described Request message necessary information comprises visitor's IP address, the concrete domain name of visit, concrete URL, Refrence information, the UserAgent of visit and the information such as Cookies of carrying;
S4 analyzes the Response message data in the described http protocol data bag, obtains Response message necessary information; Be specially, to the processing of decoding of the Response message in the described http protocol massages, isolate the Response necessary information, and with described Response necessary information buffering; Described Response message necessary information comprises information such as response status sign indicating number, the content type that carries and message length.
S5 analyzes described Request message necessary information and/or Response message necessary information, obtains visiting the behavior type information; Be specially, according to described Request message and the entrained information of described Response message visitor's visit behavior analyzed, determine the behavior type of described visit behavior; Described visit behavior type comprises: multiple behavior types such as normal visit, reptile and attack.
S6, with described Request message necessary information and/or Response message necessary information and/or visit behavior type and pre-conditioned contrast, if meet the described pre-conditioned then described Request message of buffer memory, and wait for and obtain the Response message corresponding with this Request message, after getting access to the Response message corresponding with described Request message, then will be mutually corresponding Request message and Response message form complete access process, and described complete access process be saved in according to default form form web log file in database and/or the file; Be specially, with described Request necessary information and/or described Response necessary information and/or described visit behavior type and the comparison of default daily record condition, if meet described default daily record condition, then the Request packet buffer that includes described Request necessary information, and the wait Response message corresponding with this Request message, after getting access to the Response message corresponding with this Request message, Request necessary information in then will be the mutually corresponding Request message and the Response necessary information in the Response message merge forms a complete access process, be combined into a final web log file according to default journal format and journal entries again and write database and/or journal file in and set up the search index of this web log file.
In order to allow the web log file that obtains preservation have bigger availability, after preserving web log file by above-mentioned steps, can also screen daily record by following steps.
S7, the screening conditions that arrange according to the request end filter out qualified daily record record from described database and/or file, and should qualified daily record record and save as new file and feed back to the request end again.
Described journal format: need the clauses and subclauses that record, appearance order and the form thereof of clauses and subclauses in the daily record.Common web log file form mainly contains NCSA journal format and W3C journal format at present, is adopted by apache and IIS respectively, has thinner classification not do introduction again under these two kinds of forms.
Preserve equipment owing to used the daily record of a special use to preserve server in this programme as daily record in addition, so just can arrange described pre-conditioned, described default form, described screening conditions etc. by the web-based management page on this server.Described default form can be NCSA common, NCSA combined, the W3C masterplate, self-defined and the W3C user-defined format of Apache etc., described screening conditions can be responsive state (as 200,304), requesting method (as Get), source IP, purpose IP, eliminating IP, URL rule, content type (as picture) and behavior classification (as normal visit, reptile, attack etc.) etc.; These conditions also can be used in combination.By screening conditions are set easily, and then can obtain needed log content fast, thereby needn't improve operating efficiency as the daily record of searching of looking for a needle in a haystack.
As shown in Figure 3, the utility model discloses a kind of web log file saved system of using and carry out the device that daily record is preserved, comprising:
The flow collection module is used for obtaining the entire packet that described web server is received and sent by described mirror port;
The http protocol-analysis model is used for analyzing described packet, obtains http protocol data bag from described packet;
Request message analysis module, the Request message data for analyzing described http protocol data bag obtains Request message necessary information;
Response message analysis module, the Response message data for analyzing described http protocol data bag obtains Response message necessary information;
The behavioural analysis module is used for analyzing described Request message necessary information and/or Response message necessary information, obtains visiting the behavior type information;
The daily record condition checks module, is used for described Request message necessary information and/or Response message necessary information and/or visits behavior type and pre-conditioned contrast, if meet described pre-conditioned then send into next treatment step;
Web log file preservation module is used for that complete access process is saved in database according to default form and/or journal file forms web log file.
Also comprise web log file screening module, described web log file screening module is used for according to specified requirements web log file being screened.
By adopting the disclosed technique scheme of the utility model, obtained following useful effect:
1. at record and when preserving web log file, to the website without any influence, need not to revise any configuration in website, need not to rewrite the webpage of website, can accomplish plug and play;
2. this programme can not damage the performance of web by placing the flow collection module on the bypass equipment to finish data acquisition, makes the web server can save resource and improves concurrent request amount and computational speed.
3. this programme has carried out intelligent classification by the behavioural analysis module to the visit behavior, and attack, reptile, normal visit etc. come into plain view.
4. the daily record record format of this programme and what web server of use use the apache server also can obtain the daily record of W3C form without any relation.
Log Filter module of the present utility model can directly meet the log content of user's request to user's output.
The above only is preferred implementation of the present utility model; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the utility model principle; can also make some improvements and modifications, these improvements and modifications also should be looked protection range of the present utility model.

Claims (1)

1. a web log file saved system comprises fire compartment wall, the network switch and web server, it is characterized in that, the described network switch is the network switch that possesses mirror port, is connected with daily record on the described mirror port and preserves server; Described mirror port is used for obtaining by the traffic mirroring mode communication data of the PORT COM that is connected with described daily record preservation server.
CN 201220389766 2012-08-07 2012-08-07 Web log storage system Expired - Fee Related CN203039704U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201220389766 CN203039704U (en) 2012-08-07 2012-08-07 Web log storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201220389766 CN203039704U (en) 2012-08-07 2012-08-07 Web log storage system

Publications (1)

Publication Number Publication Date
CN203039704U true CN203039704U (en) 2013-07-03

Family

ID=48691789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201220389766 Expired - Fee Related CN203039704U (en) 2012-08-07 2012-08-07 Web log storage system

Country Status (1)

Country Link
CN (1) CN203039704U (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103944995A (en) * 2014-04-28 2014-07-23 东华大学 Method for recognizing accounts of independent users in broadband network
CN109600254A (en) * 2018-11-29 2019-04-09 恒生电子股份有限公司 The generation method and related system of full link log

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103944995A (en) * 2014-04-28 2014-07-23 东华大学 Method for recognizing accounts of independent users in broadband network
CN103944995B (en) * 2014-04-28 2017-06-06 东华大学 A kind of method of separate user accounts in identification broadband network
CN109600254A (en) * 2018-11-29 2019-04-09 恒生电子股份有限公司 The generation method and related system of full link log
CN109600254B (en) * 2018-11-29 2022-04-26 恒生电子股份有限公司 Method for generating full-link log and related system

Similar Documents

Publication Publication Date Title
CN102857369B (en) Website log saving system, method and apparatus
JP6488508B2 (en) Web page access method, apparatus, device, and program
CN101079768B (en) A method for computing click data of webpage link
CN104125209B (en) Malice website prompt method and router
US9923793B1 (en) Client-side measurement of user experience quality
CN104933056B (en) Uniform resource locator De-weight method and device
CN109684575A (en) Processing method and processing device, storage medium, the computer equipment of web data
CN102436564A (en) Method and device for identifying falsified webpage
WO2014180130A1 (en) Method and system for recommending contents
CN106021583B (en) Statistical method and system for page flow data
CN102868719A (en) Network access method and server based on cache
CN103455478A (en) Webpage access accelerating method and device
US20140331142A1 (en) Method and system for recommending contents
CN112486708B (en) Page operation data processing method and processing system
CN103617266A (en) Personalized extension search method, device and system
CN106897336A (en) Web page files sending method, webpage rendering intent and device, webpage rendering system
Langhnoja et al. Pre-processing: procedure on web log file for web usage mining
CN110808868B (en) Test data acquisition method and device, computer equipment and storage medium
CN110020273B (en) Method, device and system for generating thermodynamic diagram
CN102761450A (en) System, method and device for website analysis
CN101188521B (en) A method for digging user behavior data and website server
CN103258058A (en) Page display method and system and browser
CN105159992A (en) Method and device for detecting page contents and network behaviors of application program
WO2015021459A1 (en) Method for processing and displaying real-time social data on map
CN106202368A (en) Prestrain method and apparatus

Legal Events

Date Code Title Description
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130703

Termination date: 20140807

EXPY Termination of patent right or utility model