CN101232399B - Analytical method of website abnormal visit - Google Patents
Analytical method of website abnormal visit Download PDFInfo
- Publication number
- CN101232399B CN101232399B CN2008100104236A CN200810010423A CN101232399B CN 101232399 B CN101232399 B CN 101232399B CN 2008100104236 A CN2008100104236 A CN 2008100104236A CN 200810010423 A CN200810010423 A CN 200810010423A CN 101232399 B CN101232399 B CN 101232399B
- Authority
- CN
- China
- Prior art keywords
- session
- access
- threshold values
- data
- anomaly analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Abstract
An analysis method of abnormal access to a website belongs to fields of website access analysis, data mining and security examination. The invention rapidly analyzes 'abnormal access' session according to duration time of a session, and 'abnormal access' characteristics of request number of URL or transmitting or receiving flow, server processing time, etc., and gives out a visualized analytical result via various figures and tables to determine which abnormal access is continuous and which abnormal access is burst and provides a tool for manual judgment of the abnormal access.
Description
Technical field:
The present invention relates to the website visiting behavioural analysis of the Internet.By the present invention, the person that can help the portal management visit behavior that notes abnormalities is determined the abnormal access source, judges the abnormal access type, is found out by the potential safety hazard of the page of " attacks " and website existence.
Background technology:
Usually people visit the website by browser, such operation be one mild, be interrupted, process at random, and such visit is called " normally visiting ".There are following features in " normal visit ": always carry out in finite time, can not rest on all the time several hours on the one or several webpages; Artificial by website of browser access, always browse a webpage and browse next webpage afterwards again; If continuous several even tens webpages of request and not pausing in 1 second, this is that manual operation is not accomplished.So-called " abnormal access " is meant those: automatically visit by computer program, rather than browser access, Fang Wen characteristics are like this: quick uninterruptedly requested webpage, do not pause or blanking time; Perhaps last very long.Wherein, such visit comprises " spider " program or " hacker " attacker of search engine.At present, for the observation of " abnormal access ", definite and analysis or relatively more difficult problem, still not having simple way to find " abnormal access ", mostly is to be undertaken by method hand-manipulated, that manually recognize.
Summary of the invention:
In order to solve the problem of above-mentioned existence, the present invention starts with from observing the case study of people's access websites nature, is theoretical foundation with the network communication protocol technical standard, and a kind of analytical method of website abnormal visit of automation is provided.
The objective of the invention is to realize by following technical scheme:
Analytical method of website abnormal visit, step is as follows:
(1) determine visitor's type: according to the website actual conditions, determine the visitor be by IP decide, or embedding code by IP+UserAgent, Cookie or on Website page decides;
(2) data cleansing: read access daily record, Visitor Logs is analyzed, cleans, filtered, with the Visitor Logs that forms naturally towards single URL request, by analyzing identification, when same visitor and blanking time during less than " session " Session time restriction Time Out of system definition, give an identical session identification Session ID, form the record that has access session sign Session ID and clean data, and store with the data structure of optimizing;
(3) select the anomaly analysis index: ordinary circumstance, " URL asks quantity " is defaulted as abnormal index X; As required, can select " flow " or " server process time " to be anomaly analysis index X;
(4) " threshold values " is set: set Δ T
k" threshold values " of duration and analysis indexes X;
(5) anomaly analysis: read the cleaning data after step (2) data cleansing routine processes, analyze each access session record Session, with the last access time T in the session
2Deduct the T of access time first in the session
1, obtain a session persistence Δ T=T
2-T
1If Δ T is at Δ T
kIn the scope, and analysis indexes X surpasses " threshold values " set in the step (4), and this Session is identified as " access exception " so, the memory access abnormal data;
(6) Exception Type is judged: the abnormal data that obtains in the determining step (5) is persistent anomaly or unexpected abnormality, and represents with visual and understandable diagrammatic form.
" threshold values " divides three classes in the step (4), and first kind threshold values is at the residing time range T of whole visit data
rIn, an index X is set is threshold values or an index X mean value is set is threshold values; Second class is with T
rBe divided into the subinterval T that several equate
s, an index X is set is threshold values or an index X mean value is set is threshold values; The 3rd class is with T
sBe divided into the subinterval T that several equate again
f, an index X is set is threshold values or an index X mean value is set is threshold values.
The data cleansing step is as follows:
(1) read access data;
(2) judge whether visit data is the URL junk data, as judged result for being that then visit data is cleaned;
(3) if step (2) judged result for not, then with same SessionID data, with the data structure records after optimizing to cleaning in the data.
Beneficial effect of the present invention:
Find " abnormal access " for the portal management person highly significant.The first, definite evidence is grasped by the situation of assault in the discovery website; The second, determine the attack source, find the IP that launches a offensive, even the client of launching a offensive; Three, determine the page that quilt is attacked; Four, correct the mistake of visiting in the statistical analysis, prevent from " abnormal access " included in scope of statistics; Five, observe the vestige that " spider search " gets over, grasp the visit rule of " spider search ", have a mind to arrange " key words " to allow " spider " to climb and look for, improve the clicking rate of website; Six, for preventing that trade secret from being stolen by " spider ", help to formulate anti-" spider " scheme.
From the Principle of Communication analysis of Internet, all www visits realize by the http agreement that all the http agreement is the higher layer applications of ICP/IP protocol, take " the short connection " mode to communicate by letter.Person is also arranged: the http agreement is the agreement of a kind of " do not have and connect ".Wherein each website address request all may comprise much tcp/ip communications, and both client initiation TCP/IP connection request after the document that obtains URL, disconnects TCP/IP immediately and connects, and is not to close browser just to disconnect connection afterwards.Even if long-time stop of browser is presented on certain webpage, finish as long as request is downloaded, the connection that TCP/IP connects just disconnects immediately, be not people visual close browser or jump to other websites just disconnect connection.
According to above-mentioned principle, the present invention mainly realizes the abnormal access analysis with six steps.The first step, determine visitor's type: determine that the visitor is decided, still is to embed code by IP+UserAgent, Cookie or on Website page to decide by IP; Second step, data cleansing: to formation naturally, mixed and disorderly, analyze and put in order towards access log or the other forms of Visitor Logs of single URL, by analyzing identification, give identical session Session sign ID for access session record of same user, with the data structure records user conversation Session that optimizes; The 3rd step, select the anomaly analysis index: determining that with " URL ask quantity " be index, still being is that index is carried out anomaly analysis with " flow " or with " server process time "; The 4th step, setting " threshold values ": at three different stage T
r, T
s, T
f, three different threshold values are set; The 5th step, anomaly analysis: analyze the access session duration whether at Δ T at Δ T
kIn the scope, whether analysis indexes X surpasses " threshold values " that sets, and surpasses to be identified as " abnormal access "; The 6th step, Exception Type are judged; At T
rIn, the visit trend of analysis indexes X if index X visits incessantly, then is identified as " persistent anomaly "; If index X is at T
sAnd T
fSuddenly increase in interval, this interval is identified as " unexpected abnormality " so.At first, " abnormal access analysis " is on the analysis foundation that is based upon all-access person and all accessed pages, and what good method therefore at present " magnanimity " visit data like this is carried out data mining does not have; Secondly, how to determine visitor, unusual and how to judge unusually also and all rest in the exploration with what index analysis; The present invention fundamentally solves these problems aspect two.
Description of drawings:
Fig. 1 is the flow chart of analytical method of website abnormal visit;
Fig. 2 is a flow chart of determining visitor's type;
Fig. 3 is the flow chart of data cleansing;
Fig. 4 is a flow chart of selecting analysis indexes;
Fig. 5 is a flow chart of setting threshold values;
Fig. 6 is a persistent anomaly visit tendency chart;
Fig. 7 is a unexpected abnormality visit tendency chart.
Embodiment:
Analytical method of website abnormal visit comprises the steps:
(1) determine visitor's type: by determine visitor's type " A " determine the visitor be by the IP decision, or embedding code by IP+UserAgent, Cookie or on Website page determines;
(2) data cleansing: the Visitor Logs in data cleansing program " B " the read access data " C ", according to identical visitor, continuously URL visit, be interrupted (the perhaps time limit of the Web server regulation) condition that is no more than 30 minutes, give these original Visitor Logs and give identical session Session ID, form the record that has access session sign Session ID and clean data " G ";
(3) select analysis indexes: before carrying out anomaly analysis, must select analysis indexes " D ", the acquiescence analysis indexes is a URL request quantity, and can select flow, server process time as required is analysis indexes;
(4) " threshold values " is set: at first, set Δ T
k" threshold values " of duration and index X; " threshold values " divides three classes, and first kind threshold values is at the residing time range T of whole visit data
rIn, an index X is set is threshold values or an index X mean value is set is threshold values (as: X/ hour); Second class is with T
rBe divided into the subinterval T that several equate
s, an index X is set is threshold values or an index X mean value is set is threshold values (as: X/ minute); The 3rd class is with T
sBe divided into the subinterval T that several equate again
f, an index X is set is threshold values or an index X mean value is set is threshold values (as: X/ second); T wherein
rBe meant the time range at visit data place, T
rIt can be one day or a week or one month etc.; T
sBe according to T
rScope is divided, such as: if T
rBe one day, T so
sCan become 24 intervals according to dividing time-steps; T
fBe according to T
sScope is divided, such as: if T
sBe one hour, T so
fCan be divided into 60 intervals according to segmentation; Then, at three phases T
r, T
s, T
f,, threshold values " E " is set respectively at analysis indexes X according to the actual conditions of website self; The default nothing " threshold values " that is defaulted as is equivalent to manual analysis;
(5) anomaly analysis: read and clean data " F " reading of data from clean data " G ", each access session record Session is with the last access time T in the session
2Deduct the T of access time first in the session
1, obtain a session persistence Δ T=T
2-T
1If Δ T is at Δ T
kIn the scope, and analysis indexes X stores analysis result in the abnormal data " I " into by storage abnormal data program " H " above " threshold values " of setting in (4);
(6) abnormal data that obtains in the determining step (5) is persistent anomaly or unexpected abnormality, reads the abnormal access data from abnormal data " I ", if at T
rIn access websites continuously, then this session Session is identified as persistent anomaly visit " J ", its persistent anomaly visit tendency chart as shown in Figure 6; If at T
sOr T
fSuddenly increase in interval, then this session Session is identified as unexpected abnormality visit " K "; Its unexpected abnormality visit tendency chart as shown in Figure 7.
Wherein data cleansing step is as shown in Figure 3:
(1) read access data " B
1";
(2) judge whether visit data is the URL junk data, as judged result for being that then visit data is cleaned;
(3) if step (2) judged result is not, then with same Session ID data " B
2", arrive in the cleaning data " G " with the data structure storage of optimizing.
Claims (2)
1. analytical method of website abnormal visit is characterized in that, step is as follows:
(1) determine visitor's type: according to the website actual conditions, determine the visitor be by IP decide, or embedding code by IP+User Agent, Cookie or on Website page decides;
(2) data cleansing: read access daily record, access session record is analyzed, cleans, filtered and optimizes data structure, for the access session record that forms naturally towards single URL request, by analyzing identification, when same visitor and blanking time during less than the session Session time restriction Time Out of system definition, give an identical access session sign Session ID, the record that formation has access session sign Session ID cleans data, and stores with the data structure of optimizing;
(3) select the anomaly analysis index: ordinary circumstance, " URL asks quantity " is defaulted as anomaly analysis index X; As required, can select " flow " or " server process time " to be anomaly analysis index X;
(4) " threshold values " is set: set session persistence limits value Δ T
k" threshold values " with anomaly analysis index X;
(5) anomaly analysis: read the cleaning data after step (2) data cleansing routine processes, analyze each access session record, with the last access time T in the session
2Deduct the T of access time first in the session
1, obtain a session persistence Δ T=T
2-T
1If Δ T is at Δ T
kIn the scope, and anomaly analysis index X surpasses " threshold values " set in the step (4), and this session Session is identified as " access exception " so, the memory access abnormal data;
(6) Exception Type is judged: the access exception data that obtain in the determining step (5) are persistent anomaly or unexpected abnormality, and represent with visual and understandable diagrammatic form.
2. analytical method of website abnormal visit according to claim 1 is characterized in that: " threshold values " divides three classes in the step (4), and first kind threshold values is at the residing time range T of whole visit data
rIn, it is that threshold values or the mean value that an anomaly analysis index X is set are threshold values that an anomaly analysis index X is set; Second class is with T
rBe divided into the subinterval T that several equate
s, at T
sIn an anomaly analysis index X is set is that threshold values or the mean value that an anomaly analysis index X is set are threshold values; The 3rd class is with T
sBe divided into the subinterval T that several equate again
f, at T
fIn an anomaly analysis index X is set is that threshold values or the mean value that an anomaly analysis index X is set are threshold values.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008100104236A CN101232399B (en) | 2008-02-18 | 2008-02-18 | Analytical method of website abnormal visit |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2008100104236A CN101232399B (en) | 2008-02-18 | 2008-02-18 | Analytical method of website abnormal visit |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101232399A CN101232399A (en) | 2008-07-30 |
CN101232399B true CN101232399B (en) | 2010-06-23 |
Family
ID=39898593
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2008100104236A Expired - Fee Related CN101232399B (en) | 2008-02-18 | 2008-02-18 | Analytical method of website abnormal visit |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101232399B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103379099B (en) * | 2012-04-19 | 2017-08-04 | 阿里巴巴集团控股有限公司 | Hostile attack identification method and system |
CN102932207B (en) * | 2012-11-19 | 2015-12-23 | 北京奇虎科技有限公司 | The method of monitoring website access information and server |
CN103327016B (en) * | 2013-06-06 | 2016-06-22 | 合一信息技术(北京)有限公司 | A kind of computing network Streaming Media exception playback volume the method and system to its correction |
CN103401849B (en) * | 2013-07-18 | 2017-02-15 | 盘石软件(上海)有限公司 | Abnormal session analyzing method for website logs |
CN103475543A (en) * | 2013-09-11 | 2013-12-25 | 北京思特奇信息技术股份有限公司 | Abnormal system service call detection method and system |
CN103546326B (en) * | 2013-11-04 | 2017-01-11 | 北京中搜网络技术股份有限公司 | Website traffic statistic method |
CN103605714B (en) * | 2013-11-14 | 2017-10-03 | 北京国双科技有限公司 | The recognition methods of website abnormal data and device |
CN103593484A (en) * | 2013-12-03 | 2014-02-19 | 南京安讯科技有限责任公司 | Method for filtering garbage logs during mobile phone internet surfing |
US9614853B2 (en) | 2015-01-20 | 2017-04-04 | Enzoo, Inc. | Session security splitting and application profiler |
CN104915455B (en) * | 2015-07-02 | 2017-03-15 | 焦点科技股份有限公司 | A kind of website abnormal based on user behavior accesses recognition methodss and system |
CN106921628B (en) * | 2015-12-25 | 2021-10-08 | 阿里巴巴集团控股有限公司 | Method and device for identifying network access source based on network address |
CN107508789B (en) * | 2017-06-29 | 2020-04-07 | 北京北信源软件股份有限公司 | Abnormal data identification method and device |
CN109302297B (en) * | 2017-07-25 | 2022-03-29 | 中国电信股份有限公司 | Method and device for processing network access record and computer readable storage medium |
CN111274516B (en) * | 2018-12-04 | 2024-04-05 | 阿里巴巴新加坡控股有限公司 | Page display method, page configuration method and device |
CN115242606B (en) * | 2022-06-21 | 2024-04-16 | 北京字跳网络技术有限公司 | Data processing method, device, server, storage medium and program product |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1713598A (en) * | 2004-06-25 | 2005-12-28 | 深圳市傲天通信有限公司 | Shared access testing system of internet |
CN1791022A (en) * | 2005-12-26 | 2006-06-21 | 阿里巴巴公司 | Log analyzing method and system |
-
2008
- 2008-02-18 CN CN2008100104236A patent/CN101232399B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1713598A (en) * | 2004-06-25 | 2005-12-28 | 深圳市傲天通信有限公司 | Shared access testing system of internet |
CN1791022A (en) * | 2005-12-26 | 2006-06-21 | 阿里巴巴公司 | Log analyzing method and system |
Also Published As
Publication number | Publication date |
---|---|
CN101232399A (en) | 2008-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101232399B (en) | Analytical method of website abnormal visit | |
TWI711938B (en) | System and method for high speed threat intelligence management using unsupervised machine learning and prioritization algorithms | |
US8191149B2 (en) | System and method for predicting cyber threat | |
CN109495377B (en) | Instant E-mail embedded URL credit confirming equipment, system and method | |
Yu et al. | Predicted packet padding for anonymous web browsing against traffic analysis attacks | |
Lichodzijewski et al. | Host-based intrusion detection using self-organizing maps | |
Phillips et al. | Tracing cryptocurrency scams: Clustering replicated advance-fee and phishing websites | |
DE60316543T2 (en) | ADAPTIVE BEHAVIOR-RELATED IMPACT DETECTION | |
US8286248B1 (en) | System and method of web application discovery via capture and analysis of HTTP requests for external resources | |
CN103379099B (en) | Hostile attack identification method and system | |
EP2863611B1 (en) | Device for detecting cyber attack based on event analysis and method thereof | |
CN103179132B (en) | A kind of method and device detecting and defend CC attack | |
Vengatesan et al. | Anomaly based novel intrusion detection system for network traffic reduction | |
CN105659245A (en) | Context-aware network forensics | |
US20060206715A1 (en) | Media analysis method and system for locating and reporting the presence of steganographic activity | |
CN105138709B (en) | Remote evidence taking system based on physical memory analysis | |
DE112006001378T5 (en) | Automatic management of a memory access control | |
WO2011094071A2 (en) | Insider threat correlation tool | |
WO2011094070A2 (en) | Insider threat correlation tool | |
Killer et al. | Security management and visualization in a blockchain-based collaborative defense | |
CN103457909A (en) | Botnet detection method and device | |
CN103118035A (en) | Website access request parameter legal range analysis method and device | |
CN105991634A (en) | Access control method and apparatus | |
Massa et al. | A fraud detection system based on anomaly intrusion detection systems for e-commerce applications | |
Pramono | Anomaly-based intrusion detection and prevention system on website usage using rule-growth sequential pattern analysis: Case study: Statistics of Indonesia (BPS) website |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20100623 Termination date: 20150218 |
|
EXPY | Termination of patent right or utility model |