CN105245394A - Method and equipment for analyzing network access log based on layered approach - Google Patents

Method and equipment for analyzing network access log based on layered approach Download PDF

Info

Publication number
CN105245394A
CN105245394A CN201410320752.6A CN201410320752A CN105245394A CN 105245394 A CN105245394 A CN 105245394A CN 201410320752 A CN201410320752 A CN 201410320752A CN 105245394 A CN105245394 A CN 105245394A
Authority
CN
China
Prior art keywords
access
daily record
analysis
real
regular expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410320752.6A
Other languages
Chinese (zh)
Inventor
彭晓涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING FENGXING ONLINE TECHNOLOGY Co Ltd
Original Assignee
BEIJING FENGXING ONLINE TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING FENGXING ONLINE TECHNOLOGY Co Ltd filed Critical BEIJING FENGXING ONLINE TECHNOLOGY Co Ltd
Priority to CN201410320752.6A priority Critical patent/CN105245394A/en
Publication of CN105245394A publication Critical patent/CN105245394A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention aims to provide a method and equipment for analyzing a network access log based on a layered approach. The method comprises the steps as follows: an acquisition layer acquires a network access log, and sends the network access log to a corresponding analysis layer; the analysis layer analyzes the acquired network access log in real time to obtain a corresponding analysis result; and a report layer shows the analysis result to users in the form of a report. Compared with the prior art, the network access log is analyzed in real time based on the layered approach, and the analysis result is shown to users in the form of a report. A real-time Internet HTTP business analysis method is provided, and the service condition of the Internet HTTP business can be reflected objectively and accurately based on the analysis indicators of population statistics. The problem that big Internet companies need to monitor online HTTP business in a real-time and customizable manner is solved.

Description

A kind of method and apparatus analyzing access to netwoks daily record based on layered mode
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of technology for analyzing access to netwoks daily record based on layered mode.
Background technology
, there is high requirement in modern Internet firm to web services quality, the machine of delaying of each minute, all may brings huge loss to Internet firm.
Large-scale web application, online number can reach ten million rank simultaneously.And these web apply, major part is all adopt distributed deployment scheme, such as, dispose a large amount of HTTPCache equipment and HTTP application server in national each province and city.How within the time short as far as possible, complete the collection to these user access logses, merger, analysis, report output, become one of those skilled in the art's problem needing solution badly.
Summary of the invention
The object of this invention is to provide a kind of method and apparatus analyzing access to netwoks daily record based on layered mode.
According to an aspect of the present invention, provide a kind of method analyzing access to netwoks daily record based on layered mode, wherein, the method comprises the following steps:
A acquisition layer obtains access to netwoks daily record, and is sent to corresponding analysis layer;
Analysis layer described in b carries out real-time analysis process to the described access to netwoks daily record got, to obtain corresponding analysis result;
Described analysis result is presented to user with report form by c form layer.
According to a further aspect in the invention, additionally provide a kind of equipment analyzing access to netwoks daily record based on layered mode, wherein, this equipment comprises:
Acquisition layer device, for obtaining access to netwoks daily record, and is sent to corresponding analysis layer device;
Analysis layer device, for carrying out real-time analysis process to the described access to netwoks daily record got, to obtain corresponding analysis result;
Form bed device, for being presented to user by described analysis result with report form.
Compared with prior art, the present invention is based on layered mode, real-time analysis process is carried out to access to netwoks daily record, and analysis result is presented to user with report form, provide a kind of the Internet HTTP business diagnosis method of real time implementation, based on the analysis indexes that Demographics learns, the service scenario of the Internet HTTP business can be reflected comparatively objective and accurately.The invention solves the problem that Large-Scale Interconnected net company needs to carry out online HTTP business real time implementation, customizable monitoring.
Further, present invention employs the design concept of layering, reduce the degree of coupling, mark acquisition layer, analysis layer, form layer.Wherein, analysis layer is monitored network in real time based on URL regular expression, the user access logs that web is applied in various places can be analyzed in real time implementation ground, no matter be business personnel or developer, can increase URL regular expression, Terminals Report system can export corresponding request amount information, temporal information, fault message at short notice self-servicely.More progressive ground, the present invention can also increase more visual tracking chart, facilitates user's checking and using analysis result.
Accompanying drawing explanation
By reading the detailed description done non-limiting example done with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 illustrates the equipment schematic diagram analyzing access to netwoks daily record based on layered mode according to one aspect of the invention;
Fig. 2 illustrates the equipment schematic diagram analyzing access to netwoks daily record based on layered mode in accordance with a preferred embodiment of the present invention;
Fig. 3 illustrates the method flow diagram analyzing access to netwoks daily record based on layered mode according to a further aspect of the present invention;
Fig. 4 illustrates the method flow diagram analyzing access to netwoks daily record based on layered mode in accordance with a preferred embodiment of the present invention.
In accompanying drawing, same or analogous Reference numeral represents same or analogous parts.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
Fig. 1 illustrates the equipment schematic diagram analyzing access to netwoks daily record based on layered mode according to one aspect of the invention.This equipment 1 comprises acquisition layer device 101, analysis layer device 102 and form bed device 103.
Wherein, acquisition layer device 101 obtains access to netwoks daily record, and is sent to corresponding analysis layer device 102.Particularly, acquisition layer device 101 to obtain or event of answering such as to obtain at the mode with triggering by Real-time Obtaining, regularly, obtains access to netwoks daily record from the whole network, and by the access to netwoks Log Sender that gets to corresponding analysis layer.
Such as, this acquisition layer device 101 is made up of several reptiles, termly, e.g., per minute, goes to various places machine room to crawl access to netwoks daily record, and sends to the analysis layer device 102 of upstream in real time.At this, in the middle of acquisition layer device 101 and analysis layer device 102, such as use kafka as the communication bridge between two-layer.At this, reptile is dispatched by central controller, adopts the mode of reptile can automatic adaptation disparate networks environment, automatically can process the network congestion situation etc. of short time.
Those skilled in the art will be understood that; the mode of above-mentioned acquisition access to netwoks daily record is only citing; the mode of other acquisition access to netwoks daily records that are existing or that may occur from now on, as being applicable to the present invention, within also should being included in scope, and is contained in this at this with way of reference.
Analysis layer device 102 carries out real-time analysis process to the described access to netwoks daily record got, to obtain corresponding analysis result.Particularly, analysis layer device 102 carries out real-time analysis process to the access to netwoks daily record got from acquisition layer device 101, such as, based on URL regular expression, real-time analysis process is carried out to described access to netwoks daily record, or further, duplicate removal process is carried out to this access to netwoks daily record, again based on URL regular expression, real-time analysis process is carried out to the access to netwoks daily record after duplicate removal process, to obtain corresponding analysis result.
At this, access to netwoks log recording web server receives the file of the various raw informations such as process request and run time error, contains the web page address URL of network user's request access.URL is made up of agreement, domain name, request address three part, and complete URL uniquely determines the resource of a request, as the page, content module, file or multimedia resource etc.Can learn which web page contents the network user have accessed by the information extracted in URL, by the analysis to URL in the whole network access to netwoks daily record, the situation that various web page resources is accessed can be learnt, as access times, access frequency etc.
At this, analysis layer device 102 such as can comprise Kafka daily record receiving queue, Storm real-time analysis cluster, Hbase analysis result preliminary treatment cluster, Mysql result form storage cluster etc.
Wherein, Kafka daily record receiving queue is used for obtaining access to netwoks daily record from acquisition layer device 101.
Storm real-time analysis cluster is used for carrying out real-time analysis process to the access to netwoks daily record that this gets.Storm is real-time, distributed and possesses high fault-tolerant computing system, can process large batch of data, and what process can also be allowed under the prerequisite ensureing high reliability to carry out is more real-time.Have be easy to expand, every bar information process can be guaranteed, Storm cluster management is simple and easy, possess high fault-tolerant, can by advantages such as any language designs.
Hbase analysis result preliminary treatment cluster is used for carrying out preliminary treatment to the analysis result that this analysis layer device 102 obtains.Hbase be a high reliability, high-performance, towards row, telescopic distributed memory system, it is different from general relational database, be one be suitable for unstructured data store database, and Hbase be per-column instead of based on row pattern.Utilize HBase technology can erect large-scale structure storage cluster on cheap PCServer.
Mysql result form storage cluster is used for analysis result to store with the form of form.Like this, form bed device 103 directly can use the report data in Mysql, to be presented to user, and need not be concerned about all kinds of collection and the analytical work flow process of bottom.
Those skilled in the art will be understood that; the mode of above-mentioned real-time analysis process access to netwoks daily record is only citing; the mode of other real-time analysis process access to netwoks daily records that are existing or that may occur from now on is as being applicable to the present invention; also within scope should being included in, and this is contained at this with way of reference.
Described analysis result is presented to user with report form by form bed device 103.Particularly, the analysis result that form bed device 103 obtains according to analysis layer device 102 real-time analysis process, by this analysis result such as by drawing technique, is presented to user with report form.Such as, analysis result stores with the form of form by the Mysql result form storage cluster of what-if bed device 102, then this form bed device 103 directly obtains report data in this Mysql, and is presented to user with the form of form.
Those skilled in the art will be understood that; the above-mentioned mode representing analysis result is only citing; other existing or modes representing analysis result that may occur from now on, as being applicable to the present invention, within also should being included in scope, and are contained in this at this with way of reference.
Preferably, be constant work between each device of equipment 1.Particularly, acquisition layer device 101 obtains access to netwoks daily record, and is sent to corresponding analysis layer device; Analysis layer device 102 carries out real-time analysis process to the described access to netwoks daily record got, to obtain corresponding analysis result; Described analysis result is presented to user with report form by form bed device 103.At this, it will be understood by those skilled in the art that " continuing " refers to that each device of equipment 1 is respectively according to the mode of operation requirement of setting or real-time adjustment, obtains access to netwoks daily record, carries out real-time analysis process, analysis result is presented to user with report form.
At this, equipment 1 is based on layered mode, real-time analysis process is carried out to access to netwoks daily record, and analysis result is presented to user with report form, provide a kind of the Internet HTTP business diagnosis method of real time implementation, based on the analysis indexes that Demographics learns, the service scenario of the Internet HTTP business can be reflected comparatively objective and accurately, solve the problem that Large-Scale Interconnected net company needs to carry out online HTTP business real time implementation, customizable monitoring.
Preferably, described analysis layer device 102, based on URL regular expression, carries out real-time analysis process to described access to netwoks daily record, to obtain corresponding analysis result.Particularly, analysis layer device 102, based on URL regular expression, carries out real-time analysis process to the access to netwoks daily record obtained from acquisition layer device 101 place, as utilized URL regular expression, matching treatment is carried out to the URL in access to netwoks daily record, to obtain corresponding analysis result.
At this, this URL regular expression can be that user inputs in real time, also can be that analysis layer device 102 obtains, such as, according to actual conditions from the suitable URL regular expression selected in this expression formula storehouse in the expression formula storehouse storing URL regular expression.
At this, regular expression is the instrument for carrying out text matches, usually uses single character string to describe, mate a series of character string meeting certain syntactic rule.Regular expression is a kind of logical formula to string operation, use the combination of some specific characters and these specific characters defined in advance exactly, form one " regular character string ", this " regular character string " is used for expressing a kind of filter logic to character string.
At this, URL and the URL regular expression in access to netwoks daily record is mated, in this URL regular expression, contain default keyword, if coupling, then show to comprise this keyword in this URL, if do not mate, then do not comprise.By carrying out the coupling of at least one URL regular expression to URL, the classification of information or the information comprised in URL can be determined.
At this, that equipment 1 inputs according to user or in expression formula storehouse URL regular expression, carries out real-time value to http network access log and excavates.URL, as the core component of http protocol, to have with it at line service and associates very closely.By building the regular expression of URL rank, a large amount of requirement on flexibilities can be met, multiple domain name can be associated, the single page can be refine to again.
More preferably, described analysis layer device 102 self-representation storehouse obtains URL regular expression; Based on described URL regular expression, real-time analysis process is carried out to described access to netwoks daily record, to obtain corresponding analysis result.Particularly, URL regular expression such as can be stored in expression formula storehouse, user can by the mode of input URL regular expression, this expression formula storehouse is set up or upgraded, mutual such as by with this expression formula storehouse of this analysis layer device 102, as one or many call this expression formula storehouse the communication mode of the application programming interfaces (API) that provide or other agreements, obtain URL regular expression, such as, from this expression formula storehouse, applicable URL regular expression is chosen based on actual conditions; Subsequently, this analysis layer device 102, again based on the URL regular expression of this acquisition, carries out real-time analysis process to the access to netwoks daily record obtained from acquisition layer device 101, to obtain corresponding analysis result.
At this, store URL regular expression in this expression formula storehouse, it can be arranged in this equipment 1, also can be arranged in the third party device be connected by network with this equipment 1.
Preferably, this equipment 1 also comprises updating device (not shown), and this updating device obtains the URL regular expression that user increases newly; According to described newly-increased URL regular expression, set up or upgrade described expression formula storehouse.Particularly, user can input URL regular expression at any time, and analysis layer device 102 directly can use this URL regular expression, carries out real-time analysis process to this access to netwoks daily record; The URL regular expression that this user increases input newly also can be obtained by updating device, this updating device again by this newly-increased URL regular expression stored in expression formula storehouse, to realize foundation to this expression formula storehouse or renewal.
Those skilled in the art will be understood that; the mode in above-mentioned foundation or renewal expression formula storehouse is only citing; the mode in other foundation that are existing or that may occur from now on or renewal expression formula storehouse is as being applicable to the present invention; also within scope should being included in, and this is contained at this with way of reference.
Preferably, equipment 1 uses the large data processing technique of similar Hadoop, but more lays particular stress on real time implementation, applies the theory of stream to the filtering data that breaks.Meanwhile, possess the ability of horizontal extension, make it possible to the quick rise tackling network traffics.This equipment 1 merger can also store valuable data, by stream data, carries out merger, is stored into Cache equipment and carries out persistent storage, ensure that the performance that final data exports, and future Query uses.
At this, equipment 1 have employed the design concept of layering, reduces the degree of coupling, has marked acquisition layer, analysis layer, form layer.Wherein, analysis layer is monitored network in real time based on URL regular expression, the user access logs that web is applied in various places can be analyzed in real time implementation ground, no matter be business personnel or developer, can increase URL regular expression, Terminals Report system can export corresponding request amount information, temporal information, fault message at short notice self-servicely.
Preferably, described analysis result, based on real time graphical system, is presented to described user with report form by described form bed device 103.Particularly, the analysis result that form bed device 103 obtains according to analysis layer device 102 real-time analysis process, based on real time graphical system, such as, based on the real time graphical system that Django, Cacti etc. develop, described analysis result is presented to described user with report form.
At this, adopt and graphically export data, only need to use browser, just can see the progress of the whole network HTTP business.This form bed device 103 such as adopts up-to-date browser end drawing technique, need not rely on other third party's plug-in units such as Flash, solves the problem that real time implementation exports.This drawing technique is such as structured on HTML5Canvas basis, can make full use of the GPU speed technology of modern computer.
At this, analysis result, based on real time graphical system, is presented to described user with report form by equipment 1, facilitates user's checking and using analysis result.
Fig. 2 illustrates the equipment schematic diagram analyzing access to netwoks daily record based on layered mode in accordance with a preferred embodiment of the present invention.This equipment 1 also comprises duplicate removal device 204.Be described in detail the preferred embodiment referring to Fig. 2: particularly, acquisition layer device 201 obtains access to netwoks daily record, and is sent to corresponding analysis layer device; Duplicate removal device 204 carries out duplicate removal process to described access to netwoks daily record, to obtain the access to netwoks daily record after duplicate removal process; Described analysis layer device 202, based on URL regular expression, carries out real-time analysis process to the access to netwoks daily record after described duplicate removal process, to obtain corresponding analysis result; Described analysis result is presented to user with report form by form bed device 203.Wherein, acquisition layer device 201 is identical with corresponding intrument shown in Fig. 1 with form bed device 203 or substantially identical, so place repeats no more, and is contained in this by way of reference.
Wherein, duplicate removal device 204 carries out duplicate removal process to described access to netwoks daily record, to obtain the access to netwoks daily record after duplicate removal process.Particularly, due in the whole network access to netwoks daily record, the number of times that URL is accessed repeatedly is very high, if carry out real-time analysis process to each URL in this whole network access to netwoks daily record, then can reduce the efficiency of analyzing and processing, therefore, duplicate removal device 204 carries out duplicate removal process to this access to netwoks daily record, obtains the access to netwoks daily record after duplicate removal process.Such as, set up a URL table, for each URL in access to netwoks daily record, take out one of them URL successively, judge whether this URL has existed this URL in showing, if do not exist, then this URL is added in this URL table, if exist, then do not add, to realize the duplicate removal process to this access to netwoks daily record, finally, the URL in this URL table is the access to netwoks daily record after duplicate removal process.
Those skilled in the art will be understood that; above-mentioned citing is only to the mode that duplicate removal process is carried out in access to netwoks daily record; other existing or may occur from now on the mode of duplicate removal process is carried out as being applicable to the present invention to access to netwoks daily record; also within scope should being included in, and this is contained at this with way of reference.
Subsequently, described analysis layer device 202, based on URL regular expression, carries out real-time analysis process to the access to netwoks daily record after described duplicate removal process, to obtain corresponding analysis result.Particularly, that analysis layer device 202 inputs based on user or in expression formula storehouse URL regular expression, real-time analysis process is carried out to the access to netwoks daily record after this duplicate removal process, as, URL matching regular expressions is carried out to the URL after duplicate removal process, obtain corresponding matching result, using the analysis result as correspondence.
At this, duplicate removal process is carried out in equipment 1 pair of access to netwoks daily record, carries out real-time analysis process, further increase the efficiency of analyzing and processing to the access to netwoks daily record after duplicate removal process.
Fig. 3 illustrates the method flow diagram analyzing access to netwoks daily record based on layered mode according to a further aspect of the present invention.
In step S301, acquisition layer obtains access to netwoks daily record, and is sent to corresponding analysis layer.Particularly, in step S301, acquisition layer to obtain or event of answering such as to obtain at the mode with triggering by Real-time Obtaining, regularly, obtains access to netwoks daily record from the whole network, and by the access to netwoks Log Sender that gets to corresponding analysis layer.
Such as, in step S301, acquisition layer is made up of several reptiles, termly, e.g., per minute, goes to various places machine room to crawl access to netwoks daily record, and sends to the analysis layer of upstream in real time.At this, in the middle of acquisition layer and analysis layer, such as use kafka as the communication bridge between two-layer.At this, reptile is dispatched by central controller, adopts the mode of reptile can automatic adaptation disparate networks environment, automatically can process the network congestion situation etc. of short time.
Those skilled in the art will be understood that; the mode of above-mentioned acquisition access to netwoks daily record is only citing; the mode of other acquisition access to netwoks daily records that are existing or that may occur from now on, as being applicable to the present invention, within also should being included in scope, and is contained in this at this with way of reference.
In step s 302, analysis layer carries out real-time analysis process to the described access to netwoks daily record got, to obtain corresponding analysis result.Particularly, in step s 302, analysis layer carries out real-time analysis process to the access to netwoks daily record got from acquisition layer, such as, based on URL regular expression, real-time analysis process is carried out to described access to netwoks daily record, or further, carry out duplicate removal process to this access to netwoks daily record, then based on URL regular expression, real-time analysis process is carried out to the access to netwoks daily record after duplicate removal process, to obtain corresponding analysis result.
At this, access to netwoks log recording web server receives the file of the various raw informations such as process request and run time error, contains the web page address URL of network user's request access.URL is made up of agreement, domain name, request address three part, and complete URL uniquely determines the resource of a request, as the page, content module, file or multimedia resource etc.Can learn which web page contents the network user have accessed by the information extracted in URL, by the analysis to URL in the whole network access to netwoks daily record, the situation that various web page resources is accessed can be learnt, as access times, access frequency etc.
At this, analysis layer such as can comprise Kafka daily record receiving queue, Storm real-time analysis cluster, Hbase analysis result preliminary treatment cluster, Mysql result form storage cluster etc.
Wherein, Kafka daily record receiving queue is used for obtaining access to netwoks daily record from acquisition layer.
Storm real-time analysis cluster is used for carrying out real-time analysis process to the access to netwoks daily record that this gets.Storm is real-time, distributed and possesses high fault-tolerant computing system, can process large batch of data, and what process can also be allowed under the prerequisite ensureing high reliability to carry out is more real-time.Have be easy to expand, every bar information process can be guaranteed, Storm cluster management is simple and easy, possess high fault-tolerant, can by advantages such as any language designs.
Hbase analysis result preliminary treatment cluster is used for carrying out preliminary treatment to the analysis result that this analysis layer obtains.Hbase be a high reliability, high-performance, towards row, telescopic distributed memory system, it is different from general relational database, be one be suitable for unstructured data store database, and Hbase be per-column instead of based on row pattern.Utilize HBase technology can erect large-scale structure storage cluster on cheap PCServer.
Mysql result form storage cluster is used for analysis result to store with the form of form.Like this, form layer directly can use the report data in Mysql, to be presented to user, and need not be concerned about all kinds of collection and the analytical work flow process of bottom.
Those skilled in the art will be understood that; the mode of above-mentioned real-time analysis process access to netwoks daily record is only citing; the mode of other real-time analysis process access to netwoks daily records that are existing or that may occur from now on is as being applicable to the present invention; also within scope should being included in, and this is contained at this with way of reference.
In step S303, described analysis result is presented to user with report form by form layer.Particularly, in step S303, the analysis result that form layer obtains according to analysis layer real-time analysis process, by this analysis result such as by drawing technique, is presented to user with report form.Such as, analysis result stores with the form of form by the Mysql result form storage cluster of what-if layer, then, in step S303, this form layer directly obtains report data in this Mysql, and is presented to user with the form of form.
Those skilled in the art will be understood that; the above-mentioned mode representing analysis result is only citing; other existing or modes representing analysis result that may occur from now on, as being applicable to the present invention, within also should being included in scope, and are contained in this at this with way of reference.
Preferably, be constant work between each step of equipment 1.Particularly, in step S301, acquisition layer obtains access to netwoks daily record, and is sent to corresponding analysis layer; In step s 302, analysis layer carries out real-time analysis process to the described access to netwoks daily record got, to obtain corresponding analysis result; In step S303, described analysis result is presented to user with report form by form layer.At this, it will be understood by those skilled in the art that " continuing " refers to that each step of equipment 1 is respectively according to the mode of operation requirement of setting or real-time adjustment, obtains access to netwoks daily record, carries out real-time analysis process, analysis result is presented to user with report form.
At this, equipment 1 is based on layered mode, real-time analysis process is carried out to access to netwoks daily record, and analysis result is presented to user with report form, provide a kind of the Internet HTTP business diagnosis method of real time implementation, based on the analysis indexes that Demographics learns, the service scenario of the Internet HTTP business can be reflected comparatively objective and accurately, solve the problem that Large-Scale Interconnected net company needs to carry out online HTTP business real time implementation, customizable monitoring.
Preferably, in step s 302, described analysis layer, based on URL regular expression, carries out real-time analysis process to described access to netwoks daily record, to obtain corresponding analysis result.Particularly, in step s 302, analysis layer is based on URL regular expression, real-time analysis process is carried out to the access to netwoks daily record obtained from acquisition layer place, as utilized URL regular expression, matching treatment is carried out to the URL in access to netwoks daily record, to obtain corresponding analysis result.
At this, this URL regular expression can be that user inputs in real time, also can be that analysis layer obtains, such as, according to actual conditions from the suitable URL regular expression selected in this expression formula storehouse in the expression formula storehouse storing URL regular expression.
At this, regular expression is the instrument for carrying out text matches, usually uses single character string to describe, mate a series of character string meeting certain syntactic rule.Regular expression is a kind of logical formula to string operation, use the combination of some specific characters and these specific characters defined in advance exactly, form one " regular character string ", this " regular character string " is used for expressing a kind of filter logic to character string.
At this, URL and the URL regular expression in access to netwoks daily record is mated, in this URL regular expression, contain default keyword, if coupling, then show to comprise this keyword in this URL, if do not mate, then do not comprise.By carrying out the coupling of at least one URL regular expression to URL, the classification of information or the information comprised in URL can be determined.
At this, that equipment 1 inputs according to user or in expression formula storehouse URL regular expression, carries out real-time value to http network access log and excavates.URL, as the core component of http protocol, to have with it at line service and associates very closely.By building the regular expression of URL rank, a large amount of requirement on flexibilities can be met, multiple domain name can be associated, the single page can be refine to again.
More preferably, in step s 302, described analysis layer self-representation storehouse obtains URL regular expression; Based on described URL regular expression, real-time analysis process is carried out to described access to netwoks daily record, to obtain corresponding analysis result.Particularly, URL regular expression such as can be stored in expression formula storehouse, user can by the mode of input URL regular expression, this expression formula storehouse is set up or upgraded, in step s 302, mutual such as by with this expression formula storehouse of this analysis layer, as one or many call this expression formula storehouse the communication mode of the application programming interfaces (API) that provide or other agreements, obtain URL regular expression, such as, from this expression formula storehouse, applicable URL regular expression is chosen based on actual conditions; Subsequently, in step s 302, this analysis layer, again based on the URL regular expression of this acquisition, carries out real-time analysis process to the access to netwoks daily record obtained from acquisition layer, to obtain corresponding analysis result.
At this, store URL regular expression in this expression formula storehouse, it can be arranged in this equipment 1, also can be arranged in the third party device be connected by network with this equipment 1.
Preferably, the method also comprises step S305 (not shown), and in step S305, equipment 1 obtains the URL regular expression that user increases newly; According to described newly-increased URL regular expression, set up or upgrade described expression formula storehouse.Particularly, user can input URL regular expression at any time, and analysis layer directly can use this URL regular expression, carries out real-time analysis process to this access to netwoks daily record; The URL regular expression that this user increases input newly also can be obtained by equipment list 1, in step S305, equipment 1 again by this newly-increased URL regular expression stored in expression formula storehouse, to realize foundation to this expression formula storehouse or renewal.
Those skilled in the art will be understood that; the mode in above-mentioned foundation or renewal expression formula storehouse is only citing; the mode in other foundation that are existing or that may occur from now on or renewal expression formula storehouse is as being applicable to the present invention; also within scope should being included in, and this is contained at this with way of reference.
Preferably, equipment 1 uses the large data processing technique of similar Hadoop, but more lays particular stress on real time implementation, applies the theory of stream to the filtering data that breaks.Meanwhile, possess the ability of horizontal extension, make it possible to the quick rise tackling network traffics.This equipment 1 merger can also store valuable data, by stream data, carries out merger, is stored into Cache equipment and carries out persistent storage, ensure that the performance that final data exports, and future Query uses.
At this, equipment 1 have employed the design concept of layering, reduces the degree of coupling, has marked acquisition layer, analysis layer, form layer.Wherein, analysis layer is monitored network in real time based on URL regular expression, the user access logs that web is applied in various places can be analyzed in real time implementation ground, no matter be business personnel or developer, can increase URL regular expression, Terminals Report system can export corresponding request amount information, temporal information, fault message at short notice self-servicely.
Preferably, in step S303, described analysis result, based on real time graphical system, is presented to described user with report form by described form layer.Particularly, in step S303, the analysis result that form layer obtains according to analysis layer real-time analysis process, based on real time graphical system, such as, based on the real time graphical system that Django, Cacti etc. develop, described analysis result is presented to described user with report form.
At this, adopt and graphically export data, only need to use browser, just can see the progress of the whole network HTTP business.This form layer such as adopts up-to-date browser end drawing technique, need not rely on other third party's plug-in units such as Flash, solves the problem that real time implementation exports.This drawing technique is such as structured on HTML5Canvas basis, can make full use of the GPU speed technology of modern computer.
At this, analysis result, based on real time graphical system, is presented to described user with report form by equipment 1, facilitates user's checking and using analysis result.
Fig. 4 illustrates the method flow diagram analyzing access to netwoks daily record based on layered mode in accordance with a preferred embodiment of the present invention.Be described in detail the preferred embodiment referring to Fig. 4: particularly, in step S401, acquisition layer obtains access to netwoks daily record, and is sent to corresponding analysis layer; In step s 404, equipment 1 carries out duplicate removal process to described access to netwoks daily record, to obtain the access to netwoks daily record after duplicate removal process; In step S402, analysis layer, based on URL regular expression, carries out real-time analysis process to the access to netwoks daily record after described duplicate removal process, to obtain corresponding analysis result; In step S403, described analysis result is presented to user with report form by form layer.Wherein, step S401 is identical with step corresponding shown in S403 and Fig. 3 or substantially identical, so place repeats no more, and is contained in this by way of reference.
Wherein, in step s 404, equipment 1 carries out duplicate removal process to described access to netwoks daily record, to obtain the access to netwoks daily record after duplicate removal process.Particularly, due in the whole network access to netwoks daily record, the number of times that URL is accessed repeatedly is very high, if carry out real-time analysis process to each URL in this whole network access to netwoks daily record, then can reduce the efficiency of analyzing and processing, therefore, in step s 404, equipment 1 carries out duplicate removal process to this access to netwoks daily record, obtains the access to netwoks daily record after duplicate removal process.Such as, set up a URL table, for each URL in access to netwoks daily record, take out one of them URL successively, judge whether this URL has existed this URL in showing, if do not exist, then this URL is added in this URL table, if exist, then do not add, to realize the duplicate removal process to this access to netwoks daily record, finally, the URL in this URL table is the access to netwoks daily record after duplicate removal process.
Those skilled in the art will be understood that; above-mentioned citing is only to the mode that duplicate removal process is carried out in access to netwoks daily record; other existing or may occur from now on the mode of duplicate removal process is carried out as being applicable to the present invention to access to netwoks daily record; also within scope should being included in, and this is contained at this with way of reference.
Subsequently, in step S402, described analysis layer, based on URL regular expression, carries out real-time analysis process to the access to netwoks daily record after described duplicate removal process, to obtain corresponding analysis result.Particularly, in step S402, URL regular expression in that analysis layer inputs based on user or expression formula storehouse, real-time analysis process is carried out to the access to netwoks daily record after this duplicate removal process, as, URL matching regular expressions is carried out to the URL after duplicate removal process, obtains corresponding matching result, using the analysis result as correspondence.
At this, duplicate removal process is carried out in equipment 1 pair of access to netwoks daily record, carries out real-time analysis process, further increase the efficiency of analyzing and processing to the access to netwoks daily record after duplicate removal process.
It should be noted that the present invention can be implemented in the assembly of software and/or software restraint, such as, application-specific integrated circuit (ASIC) (ASIC), general object computer or any other similar hardware device can be adopted to realize.In one embodiment, software program of the present invention can perform to realize step mentioned above or function by processor.Similarly, software program of the present invention (comprising relevant data structure) can be stored in computer readable recording medium storing program for performing, such as, and RAM memory, magnetic or CD-ROM driver or floppy disc and similar devices.In addition, steps more of the present invention or function can adopt hardware to realize, such as, as coordinating with processor thus performing the circuit of each step or function.
In addition, a part of the present invention can be applied to computer program, such as computer program instructions, when it is performed by computer, by the operation of this computer, can call or provide according to method of the present invention and/or technical scheme.And call the program command of method of the present invention, may be stored in fixing or moveable recording medium, and/or be transmitted by the data flow in broadcast or other signal bearing medias, and/or be stored in the working storage of the computer equipment run according to described program command.At this, comprise a device according to one embodiment of present invention, this device comprises the memory for storing computer program instructions and the processor for execution of program instructions, wherein, when this computer program instructions is performed by this processor, trigger this plant running based on the aforementioned method according to multiple embodiment of the present invention and/or technical scheme.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned one exemplary embodiment, and when not deviating from spirit of the present invention or essential characteristic, the present invention can be realized in other specific forms.Therefore, no matter from which point, all should embodiment be regarded as exemplary, and be nonrestrictive, scope of the present invention is limited by claims instead of above-mentioned explanation, and all changes be therefore intended in the implication of the equivalency by dropping on claim and scope are included in the present invention.Any Reference numeral in claim should be considered as the claim involved by limiting.In addition, obviously " comprising " one word do not get rid of other unit or step, odd number does not get rid of plural number.Multiple unit of stating in device claim or device also can be realized by software or hardware by a unit or device.First, second word such as grade is used for representing title, and does not represent any specific order.

Claims (12)

1. analyze a method for access to netwoks daily record based on layered mode, wherein, the method comprises the following steps:
A acquisition layer obtains access to netwoks daily record, and is sent to corresponding analysis layer;
Analysis layer described in b carries out real-time analysis process to the described access to netwoks daily record got, to obtain corresponding analysis result;
Described analysis result is presented to user with report form by c form layer.
2. method according to claim 1, wherein, described step b comprises:
-described analysis layer, based on URL regular expression, carries out real-time analysis process to described access to netwoks daily record, to obtain corresponding analysis result.
3. method according to claim 2, wherein, described step b comprises:
-described analysis layer self-representation storehouse obtains URL regular expression;
-based on described URL regular expression, real-time analysis process is carried out to described access to netwoks daily record, to obtain corresponding analysis result.
4. method according to claim 3, wherein, the method also comprises:
The URL regular expression that-acquisition user increases newly;
-according to described newly-increased URL regular expression, set up or upgrade described expression formula storehouse.
5. the method according to any one of claim 2 to 4, wherein, the method also comprises:
-duplicate removal process is carried out to described access to netwoks daily record, to obtain the access to netwoks daily record after duplicate removal process;
Wherein, described step b comprises:
-described analysis layer, based on URL regular expression, carries out real-time analysis process to the access to netwoks daily record after described duplicate removal process, to obtain corresponding analysis result.
6. method according to claim 1, wherein, described step c comprises:
Described analysis result, based on real time graphical system, is presented to described user with report form by-described form layer.
7. analyze an equipment for access to netwoks daily record based on layered mode, wherein, this equipment comprises:
Acquisition layer device, for obtaining access to netwoks daily record, and is sent to corresponding analysis layer device;
Analysis layer device, for carrying out real-time analysis process to the described access to netwoks daily record got, to obtain corresponding analysis result;
Form bed device, for being presented to user by described analysis result with report form.
8. equipment according to claim 7, wherein, described analysis layer device is used for:
-based on URL regular expression, real-time analysis process is carried out to described access to netwoks daily record, to obtain corresponding analysis result.
9. equipment according to claim 8, wherein, described analysis layer device is used for:
-self-representation storehouse obtains URL regular expression;
-based on described URL regular expression, real-time analysis process is carried out to described access to netwoks daily record, to obtain corresponding analysis result.
10. equipment according to claim 9, wherein, this equipment also comprises updating device, for:
The URL regular expression that-acquisition user increases newly;
-according to described newly-increased URL regular expression, set up or upgrade described expression formula storehouse.
Equipment according to any one of 11. according to Claim 8 to 10, wherein, this equipment also comprises:
Duplicate removal device, for carrying out duplicate removal process to described access to netwoks daily record, to obtain the access to netwoks daily record after duplicate removal process;
Wherein, described analysis layer device is used for:
-based on URL regular expression, real-time analysis process is carried out to the access to netwoks daily record after described duplicate removal process, to obtain corresponding analysis result.
12. equipment according to claim 7, wherein, described form bed device is used for:
-based on real time graphical system, described analysis result is presented to described user with report form.
CN201410320752.6A 2014-07-07 2014-07-07 Method and equipment for analyzing network access log based on layered approach Pending CN105245394A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410320752.6A CN105245394A (en) 2014-07-07 2014-07-07 Method and equipment for analyzing network access log based on layered approach

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410320752.6A CN105245394A (en) 2014-07-07 2014-07-07 Method and equipment for analyzing network access log based on layered approach

Publications (1)

Publication Number Publication Date
CN105245394A true CN105245394A (en) 2016-01-13

Family

ID=55042905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410320752.6A Pending CN105245394A (en) 2014-07-07 2014-07-07 Method and equipment for analyzing network access log based on layered approach

Country Status (1)

Country Link
CN (1) CN105245394A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062803A (en) * 2018-08-15 2018-12-21 杭州安恒信息技术股份有限公司 The method and apparatus for automatically generating test case are realized based on crawler
CN109871889A (en) * 2019-01-31 2019-06-11 内蒙古工业大学 Mass psychology appraisal procedure under emergency event
CN110098957A (en) * 2019-04-04 2019-08-06 北京市天元网络技术股份有限公司 Big data analysis system based on network log
CN110990350A (en) * 2019-11-28 2020-04-10 泰康保险集团股份有限公司 Log analysis method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101316185A (en) * 2007-06-01 2008-12-03 阿里巴巴集团控股有限公司 Method for positioning system resource bottleneck by analysis result based on log file
CN102523106A (en) * 2011-12-04 2012-06-27 东华大学 Video website user behavior analysis system based on Flex RIA (Rich Internet Applications) technology
CN103425750A (en) * 2013-07-23 2013-12-04 国云科技股份有限公司 Cross-platform and cross-application log collecting system and collecting managing method thereof
CN103631699A (en) * 2012-08-28 2014-03-12 纽海信息技术(上海)有限公司 Log management system and method for log monitoring, acquiring and querying

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101316185A (en) * 2007-06-01 2008-12-03 阿里巴巴集团控股有限公司 Method for positioning system resource bottleneck by analysis result based on log file
CN102523106A (en) * 2011-12-04 2012-06-27 东华大学 Video website user behavior analysis system based on Flex RIA (Rich Internet Applications) technology
CN103631699A (en) * 2012-08-28 2014-03-12 纽海信息技术(上海)有限公司 Log management system and method for log monitoring, acquiring and querying
CN103425750A (en) * 2013-07-23 2013-12-04 国云科技股份有限公司 Cross-platform and cross-application log collecting system and collecting managing method thereof

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062803A (en) * 2018-08-15 2018-12-21 杭州安恒信息技术股份有限公司 The method and apparatus for automatically generating test case are realized based on crawler
CN109062803B (en) * 2018-08-15 2022-03-11 杭州安恒信息技术股份有限公司 Method and device for automatically generating test case based on crawler
CN109871889A (en) * 2019-01-31 2019-06-11 内蒙古工业大学 Mass psychology appraisal procedure under emergency event
CN110098957A (en) * 2019-04-04 2019-08-06 北京市天元网络技术股份有限公司 Big data analysis system based on network log
CN110990350A (en) * 2019-11-28 2020-04-10 泰康保险集团股份有限公司 Log analysis method and device

Similar Documents

Publication Publication Date Title
US11238069B2 (en) Transforming a data stream into structured data
US11196756B2 (en) Identifying notable events based on execution of correlation searches
JP5160556B2 (en) Log file analysis method and system based on distributed computer network
CA2777506C (en) System and method for grouping multiple streams of data
CN102819591B (en) A kind of content-based Web page classification method and system
US10133622B2 (en) Enhanced error detection in data synchronization operations
CN106815125A (en) A kind of log audit method and platform
KR20210118452A (en) Real-time event detection for social data streams
CN110362544A (en) Log processing system, log processing method, terminal and storage medium
CN104572976B (en) Website data update method and system
CN103064933A (en) Data query method and system
CN110661660B (en) Alarm information root analysis method and device
KR102009020B1 (en) Method and apparatus for providing website authentication data for search engine
CN105245394A (en) Method and equipment for analyzing network access log based on layered approach
US20150032757A1 (en) Systems and methods for data compression
CN117971606A (en) Log management system and method based on elastic search
CN117251414B (en) Data storage and processing method based on heterogeneous technology
CN116226494B (en) Crawler system and method for information search
US11755453B1 (en) Performing iterative entity discovery and instrumentation
KR101665649B1 (en) System for analyzing social media data and method for analyzing social media data using the same
JP2015153078A (en) Employment history analysis device, method and program
CN110866165A (en) Network video acquisition method and system
Yang et al. Uncovering social media data for public health surveillance
KR101865317B1 (en) Preprocessing device and method of big data for distributed file system of data
US12072783B1 (en) Performing iterative entity discovery and instrumentation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160113