CN101232399B - Analytical method of website abnormal visit - Google Patents

Analytical method of website abnormal visit Download PDF

Info

Publication number
CN101232399B
CN101232399B CN2008100104236A CN200810010423A CN101232399B CN 101232399 B CN101232399 B CN 101232399B CN 2008100104236 A CN2008100104236 A CN 2008100104236A CN 200810010423 A CN200810010423 A CN 200810010423A CN 101232399 B CN101232399 B CN 101232399B
Authority
CN
China
Prior art keywords
session
access
threshold values
data
anomaly analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008100104236A
Other languages
Chinese (zh)
Other versions
CN101232399A (en
Inventor
刘峰
孙宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN2008100104236A priority Critical patent/CN101232399B/en
Publication of CN101232399A publication Critical patent/CN101232399A/en
Application granted granted Critical
Publication of CN101232399B publication Critical patent/CN101232399B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

An analysis method of abnormal access to a website belongs to fields of website access analysis, data mining and security examination. The invention rapidly analyzes 'abnormal access' session according to duration time of a session, and 'abnormal access' characteristics of request number of URL or transmitting or receiving flow, server processing time, etc., and gives out a visualized analytical result via various figures and tables to determine which abnormal access is continuous and which abnormal access is burst and provides a tool for manual judgment of the abnormal access.

Description

Analytical method of website abnormal visit
Technical field:
The present invention relates to the website visiting behavioural analysis of the Internet.By the present invention, the person that can help the portal management visit behavior that notes abnormalities is determined the abnormal access source, judges the abnormal access type, is found out by the potential safety hazard of the page of " attacks " and website existence.
Background technology:
Usually people visit the website by browser, such operation be one mild, be interrupted, process at random, and such visit is called " normally visiting ".There are following features in " normal visit ": always carry out in finite time, can not rest on all the time several hours on the one or several webpages; Artificial by website of browser access, always browse a webpage and browse next webpage afterwards again; If continuous several even tens webpages of request and not pausing in 1 second, this is that manual operation is not accomplished.So-called " abnormal access " is meant those: automatically visit by computer program, rather than browser access, Fang Wen characteristics are like this: quick uninterruptedly requested webpage, do not pause or blanking time; Perhaps last very long.Wherein, such visit comprises " spider " program or " hacker " attacker of search engine.At present, for the observation of " abnormal access ", definite and analysis or relatively more difficult problem, still not having simple way to find " abnormal access ", mostly is to be undertaken by method hand-manipulated, that manually recognize.
Summary of the invention:
In order to solve the problem of above-mentioned existence, the present invention starts with from observing the case study of people's access websites nature, is theoretical foundation with the network communication protocol technical standard, and a kind of analytical method of website abnormal visit of automation is provided.
The objective of the invention is to realize by following technical scheme:
Analytical method of website abnormal visit, step is as follows:
(1) determine visitor's type: according to the website actual conditions, determine the visitor be by IP decide, or embedding code by IP+UserAgent, Cookie or on Website page decides;
(2) data cleansing: read access daily record, Visitor Logs is analyzed, cleans, filtered, with the Visitor Logs that forms naturally towards single URL request, by analyzing identification, when same visitor and blanking time during less than " session " Session time restriction Time Out of system definition, give an identical session identification Session ID, form the record that has access session sign Session ID and clean data, and store with the data structure of optimizing;
(3) select the anomaly analysis index: ordinary circumstance, " URL asks quantity " is defaulted as abnormal index X; As required, can select " flow " or " server process time " to be anomaly analysis index X;
(4) " threshold values " is set: set Δ T k" threshold values " of duration and analysis indexes X;
(5) anomaly analysis: read the cleaning data after step (2) data cleansing routine processes, analyze each access session record Session, with the last access time T in the session 2Deduct the T of access time first in the session 1, obtain a session persistence Δ T=T 2-T 1If Δ T is at Δ T kIn the scope, and analysis indexes X surpasses " threshold values " set in the step (4), and this Session is identified as " access exception " so, the memory access abnormal data;
(6) Exception Type is judged: the abnormal data that obtains in the determining step (5) is persistent anomaly or unexpected abnormality, and represents with visual and understandable diagrammatic form.
" threshold values " divides three classes in the step (4), and first kind threshold values is at the residing time range T of whole visit data rIn, an index X is set is threshold values or an index X mean value is set is threshold values; Second class is with T rBe divided into the subinterval T that several equate s, an index X is set is threshold values or an index X mean value is set is threshold values; The 3rd class is with T sBe divided into the subinterval T that several equate again f, an index X is set is threshold values or an index X mean value is set is threshold values.
The data cleansing step is as follows:
(1) read access data;
(2) judge whether visit data is the URL junk data, as judged result for being that then visit data is cleaned;
(3) if step (2) judged result for not, then with same SessionID data, with the data structure records after optimizing to cleaning in the data.
Beneficial effect of the present invention:
Find " abnormal access " for the portal management person highly significant.The first, definite evidence is grasped by the situation of assault in the discovery website; The second, determine the attack source, find the IP that launches a offensive, even the client of launching a offensive; Three, determine the page that quilt is attacked; Four, correct the mistake of visiting in the statistical analysis, prevent from " abnormal access " included in scope of statistics; Five, observe the vestige that " spider search " gets over, grasp the visit rule of " spider search ", have a mind to arrange " key words " to allow " spider " to climb and look for, improve the clicking rate of website; Six, for preventing that trade secret from being stolen by " spider ", help to formulate anti-" spider " scheme.
From the Principle of Communication analysis of Internet, all www visits realize by the http agreement that all the http agreement is the higher layer applications of ICP/IP protocol, take " the short connection " mode to communicate by letter.Person is also arranged: the http agreement is the agreement of a kind of " do not have and connect ".Wherein each website address request all may comprise much tcp/ip communications, and both client initiation TCP/IP connection request after the document that obtains URL, disconnects TCP/IP immediately and connects, and is not to close browser just to disconnect connection afterwards.Even if long-time stop of browser is presented on certain webpage, finish as long as request is downloaded, the connection that TCP/IP connects just disconnects immediately, be not people visual close browser or jump to other websites just disconnect connection.
According to above-mentioned principle, the present invention mainly realizes the abnormal access analysis with six steps.The first step, determine visitor's type: determine that the visitor is decided, still is to embed code by IP+UserAgent, Cookie or on Website page to decide by IP; Second step, data cleansing: to formation naturally, mixed and disorderly, analyze and put in order towards access log or the other forms of Visitor Logs of single URL, by analyzing identification, give identical session Session sign ID for access session record of same user, with the data structure records user conversation Session that optimizes; The 3rd step, select the anomaly analysis index: determining that with " URL ask quantity " be index, still being is that index is carried out anomaly analysis with " flow " or with " server process time "; The 4th step, setting " threshold values ": at three different stage T r, T s, T f, three different threshold values are set; The 5th step, anomaly analysis: analyze the access session duration whether at Δ T at Δ T kIn the scope, whether analysis indexes X surpasses " threshold values " that sets, and surpasses to be identified as " abnormal access "; The 6th step, Exception Type are judged; At T rIn, the visit trend of analysis indexes X if index X visits incessantly, then is identified as " persistent anomaly "; If index X is at T sAnd T fSuddenly increase in interval, this interval is identified as " unexpected abnormality " so.At first, " abnormal access analysis " is on the analysis foundation that is based upon all-access person and all accessed pages, and what good method therefore at present " magnanimity " visit data like this is carried out data mining does not have; Secondly, how to determine visitor, unusual and how to judge unusually also and all rest in the exploration with what index analysis; The present invention fundamentally solves these problems aspect two.
Description of drawings:
Fig. 1 is the flow chart of analytical method of website abnormal visit;
Fig. 2 is a flow chart of determining visitor's type;
Fig. 3 is the flow chart of data cleansing;
Fig. 4 is a flow chart of selecting analysis indexes;
Fig. 5 is a flow chart of setting threshold values;
Fig. 6 is a persistent anomaly visit tendency chart;
Fig. 7 is a unexpected abnormality visit tendency chart.
Embodiment:
Analytical method of website abnormal visit comprises the steps:
(1) determine visitor's type: by determine visitor's type " A " determine the visitor be by the IP decision, or embedding code by IP+UserAgent, Cookie or on Website page determines;
(2) data cleansing: the Visitor Logs in data cleansing program " B " the read access data " C ", according to identical visitor, continuously URL visit, be interrupted (the perhaps time limit of the Web server regulation) condition that is no more than 30 minutes, give these original Visitor Logs and give identical session Session ID, form the record that has access session sign Session ID and clean data " G ";
(3) select analysis indexes: before carrying out anomaly analysis, must select analysis indexes " D ", the acquiescence analysis indexes is a URL request quantity, and can select flow, server process time as required is analysis indexes;
(4) " threshold values " is set: at first, set Δ T k" threshold values " of duration and index X; " threshold values " divides three classes, and first kind threshold values is at the residing time range T of whole visit data rIn, an index X is set is threshold values or an index X mean value is set is threshold values (as: X/ hour); Second class is with T rBe divided into the subinterval T that several equate s, an index X is set is threshold values or an index X mean value is set is threshold values (as: X/ minute); The 3rd class is with T sBe divided into the subinterval T that several equate again f, an index X is set is threshold values or an index X mean value is set is threshold values (as: X/ second); T wherein rBe meant the time range at visit data place, T rIt can be one day or a week or one month etc.; T sBe according to T rScope is divided, such as: if T rBe one day, T so sCan become 24 intervals according to dividing time-steps; T fBe according to T sScope is divided, such as: if T sBe one hour, T so fCan be divided into 60 intervals according to segmentation; Then, at three phases T r, T s, T f,, threshold values " E " is set respectively at analysis indexes X according to the actual conditions of website self; The default nothing " threshold values " that is defaulted as is equivalent to manual analysis;
(5) anomaly analysis: read and clean data " F " reading of data from clean data " G ", each access session record Session is with the last access time T in the session 2Deduct the T of access time first in the session 1, obtain a session persistence Δ T=T 2-T 1If Δ T is at Δ T kIn the scope, and analysis indexes X stores analysis result in the abnormal data " I " into by storage abnormal data program " H " above " threshold values " of setting in (4);
(6) abnormal data that obtains in the determining step (5) is persistent anomaly or unexpected abnormality, reads the abnormal access data from abnormal data " I ", if at T rIn access websites continuously, then this session Session is identified as persistent anomaly visit " J ", its persistent anomaly visit tendency chart as shown in Figure 6; If at T sOr T fSuddenly increase in interval, then this session Session is identified as unexpected abnormality visit " K "; Its unexpected abnormality visit tendency chart as shown in Figure 7.
Wherein data cleansing step is as shown in Figure 3:
(1) read access data " B 1";
(2) judge whether visit data is the URL junk data, as judged result for being that then visit data is cleaned;
(3) if step (2) judged result is not, then with same Session ID data " B 2", arrive in the cleaning data " G " with the data structure storage of optimizing.

Claims (2)

1. analytical method of website abnormal visit is characterized in that, step is as follows:
(1) determine visitor's type: according to the website actual conditions, determine the visitor be by IP decide, or embedding code by IP+User Agent, Cookie or on Website page decides;
(2) data cleansing: read access daily record, access session record is analyzed, cleans, filtered and optimizes data structure, for the access session record that forms naturally towards single URL request, by analyzing identification, when same visitor and blanking time during less than the session Session time restriction Time Out of system definition, give an identical access session sign Session ID, the record that formation has access session sign Session ID cleans data, and stores with the data structure of optimizing;
(3) select the anomaly analysis index: ordinary circumstance, " URL asks quantity " is defaulted as anomaly analysis index X; As required, can select " flow " or " server process time " to be anomaly analysis index X;
(4) " threshold values " is set: set session persistence limits value Δ T k" threshold values " with anomaly analysis index X;
(5) anomaly analysis: read the cleaning data after step (2) data cleansing routine processes, analyze each access session record, with the last access time T in the session 2Deduct the T of access time first in the session 1, obtain a session persistence Δ T=T 2-T 1If Δ T is at Δ T kIn the scope, and anomaly analysis index X surpasses " threshold values " set in the step (4), and this session Session is identified as " access exception " so, the memory access abnormal data;
(6) Exception Type is judged: the access exception data that obtain in the determining step (5) are persistent anomaly or unexpected abnormality, and represent with visual and understandable diagrammatic form.
2. analytical method of website abnormal visit according to claim 1 is characterized in that: " threshold values " divides three classes in the step (4), and first kind threshold values is at the residing time range T of whole visit data rIn, it is that threshold values or the mean value that an anomaly analysis index X is set are threshold values that an anomaly analysis index X is set; Second class is with T rBe divided into the subinterval T that several equate s, at T sIn an anomaly analysis index X is set is that threshold values or the mean value that an anomaly analysis index X is set are threshold values; The 3rd class is with T sBe divided into the subinterval T that several equate again f, at T fIn an anomaly analysis index X is set is that threshold values or the mean value that an anomaly analysis index X is set are threshold values.
CN2008100104236A 2008-02-18 2008-02-18 Analytical method of website abnormal visit Expired - Fee Related CN101232399B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008100104236A CN101232399B (en) 2008-02-18 2008-02-18 Analytical method of website abnormal visit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008100104236A CN101232399B (en) 2008-02-18 2008-02-18 Analytical method of website abnormal visit

Publications (2)

Publication Number Publication Date
CN101232399A CN101232399A (en) 2008-07-30
CN101232399B true CN101232399B (en) 2010-06-23

Family

ID=39898593

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008100104236A Expired - Fee Related CN101232399B (en) 2008-02-18 2008-02-18 Analytical method of website abnormal visit

Country Status (1)

Country Link
CN (1) CN101232399B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103379099B (en) * 2012-04-19 2017-08-04 阿里巴巴集团控股有限公司 Hostile attack identification method and system
CN102932207B (en) * 2012-11-19 2015-12-23 北京奇虎科技有限公司 The method of monitoring website access information and server
CN103327016B (en) * 2013-06-06 2016-06-22 合一信息技术(北京)有限公司 A kind of computing network Streaming Media exception playback volume the method and system to its correction
CN103401849B (en) * 2013-07-18 2017-02-15 盘石软件(上海)有限公司 Abnormal session analyzing method for website logs
CN103475543A (en) * 2013-09-11 2013-12-25 北京思特奇信息技术股份有限公司 Abnormal system service call detection method and system
CN103546326B (en) * 2013-11-04 2017-01-11 北京中搜网络技术股份有限公司 Website traffic statistic method
CN103605714B (en) * 2013-11-14 2017-10-03 北京国双科技有限公司 The recognition methods of website abnormal data and device
CN103593484A (en) * 2013-12-03 2014-02-19 南京安讯科技有限责任公司 Method for filtering garbage logs during mobile phone internet surfing
US9614853B2 (en) 2015-01-20 2017-04-04 Enzoo, Inc. Session security splitting and application profiler
CN104915455B (en) * 2015-07-02 2017-03-15 焦点科技股份有限公司 A kind of website abnormal based on user behavior accesses recognition methodss and system
CN106921628B (en) * 2015-12-25 2021-10-08 阿里巴巴集团控股有限公司 Method and device for identifying network access source based on network address
CN107508789B (en) * 2017-06-29 2020-04-07 北京北信源软件股份有限公司 Abnormal data identification method and device
CN109302297B (en) * 2017-07-25 2022-03-29 中国电信股份有限公司 Method and device for processing network access record and computer readable storage medium
CN111274516B (en) * 2018-12-04 2024-04-05 阿里巴巴新加坡控股有限公司 Page display method, page configuration method and device
CN115242606B (en) * 2022-06-21 2024-04-16 北京字跳网络技术有限公司 Data processing method, device, server, storage medium and program product

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1713598A (en) * 2004-06-25 2005-12-28 深圳市傲天通信有限公司 Shared access testing system of internet
CN1791022A (en) * 2005-12-26 2006-06-21 阿里巴巴公司 Log analyzing method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1713598A (en) * 2004-06-25 2005-12-28 深圳市傲天通信有限公司 Shared access testing system of internet
CN1791022A (en) * 2005-12-26 2006-06-21 阿里巴巴公司 Log analyzing method and system

Also Published As

Publication number Publication date
CN101232399A (en) 2008-07-30

Similar Documents

Publication Publication Date Title
CN101232399B (en) Analytical method of website abnormal visit
TWI711938B (en) System and method for high speed threat intelligence management using unsupervised machine learning and prioritization algorithms
US8191149B2 (en) System and method for predicting cyber threat
CN109495377B (en) Instant E-mail embedded URL credit confirming equipment, system and method
Yu et al. Predicted packet padding for anonymous web browsing against traffic analysis attacks
Lichodzijewski et al. Host-based intrusion detection using self-organizing maps
Phillips et al. Tracing cryptocurrency scams: Clustering replicated advance-fee and phishing websites
DE60316543T2 (en) ADAPTIVE BEHAVIOR-RELATED IMPACT DETECTION
US8286248B1 (en) System and method of web application discovery via capture and analysis of HTTP requests for external resources
CN103379099B (en) Hostile attack identification method and system
EP2863611B1 (en) Device for detecting cyber attack based on event analysis and method thereof
CN103179132B (en) A kind of method and device detecting and defend CC attack
Vengatesan et al. Anomaly based novel intrusion detection system for network traffic reduction
CN105659245A (en) Context-aware network forensics
US20060206715A1 (en) Media analysis method and system for locating and reporting the presence of steganographic activity
CN105138709B (en) Remote evidence taking system based on physical memory analysis
DE112006001378T5 (en) Automatic management of a memory access control
WO2011094071A2 (en) Insider threat correlation tool
WO2011094070A2 (en) Insider threat correlation tool
Killer et al. Security management and visualization in a blockchain-based collaborative defense
CN103457909A (en) Botnet detection method and device
CN103118035A (en) Website access request parameter legal range analysis method and device
CN105991634A (en) Access control method and apparatus
Massa et al. A fraud detection system based on anomaly intrusion detection systems for e-commerce applications
Pramono Anomaly-based intrusion detection and prevention system on website usage using rule-growth sequential pattern analysis: Case study: Statistics of Indonesia (BPS) website

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100623

Termination date: 20150218

EXPY Termination of patent right or utility model