CN106961444A - A kind of hostile network reptile detection method based on hidden Markov model - Google Patents
A kind of hostile network reptile detection method based on hidden Markov model Download PDFInfo
- Publication number
- CN106961444A CN106961444A CN201710281763.1A CN201710281763A CN106961444A CN 106961444 A CN106961444 A CN 106961444A CN 201710281763 A CN201710281763 A CN 201710281763A CN 106961444 A CN106961444 A CN 106961444A
- Authority
- CN
- China
- Prior art keywords
- reptile
- http
- model
- detection method
- entropy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/145—Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Complex Calculations (AREA)
Abstract
The present invention proposes a kind of hostile network reptile detection method based on hidden Markov model, belongs to the technical field of computer software.Hostile network reptile maliciously captures the sensitive information and private data of website in the case of without permission, while the traffic behavior pattern of its barbarous formula can also cause bad influence to the service quality of website.Existing network reptile detection method can not identify hostile network reptile exactly, and False Rate is higher.Therefore, the present invention proposes a kind of new hostile network reptile detection method based on hidden Markov model, specifically includes:(1)User's HTTP flowmeter factors based on HMM,(2)Web crawlers behavior modeling based on HTTP.
Description
Technical field
The invention belongs to computer software technical field.
Background technology
Benign web crawlers is the indispensable part of search engine.Well behaved net reptile can typically consider to take website
The influence for quality of being engaged in, and strictly observe the data grabber rule of website.However, hostile network reptile is then to capture having for website
For the purpose of information, do not take harmful effect of the crawl behavior to website into account, or even violate protection of the website to data and state, by force
The sensitive information of website is captured, the adverse consequences such as privacy of user leakage and trade secret exposure is caused.Existing web crawlers is only
Reptile flow and general user's flow can be distinguished, benign and hostile network reptile is but difficult to differentiate between.
The content of the invention
The purpose of the present invention is to propose to a kind of hostile network reptile detection method based on hidden Markov model.Malice net
Network reptile maliciously captures the sensitive information and private data of website in the case of without permission, while the flow of its barbarous formula
Behavior pattern can also cause bad influence to the service quality of website.Existing network reptile detection method can not be recognized exactly
Go out hostile network reptile, False Rate is higher.Therefore, patent of the present invention proposes a kind of new malice based on hidden Markov model
Web crawlers detection method.Specifically include:(1) user's HTTP flowmeter factors based on HMM, the web crawlers of (2) based on HTTP
Behavior modeling.
Technical scheme is as follows:
1st, the construction method of HTTP traffic behaviors model
1.1 basic definitions:
Observation:Using the resource type of HTTP request as observation, then the observation sequence of HTTP flows is expressed asWhereinRepresent to be connected to the resource type that t is asked c-th.Observation space is:V=1,
2,...,N}。
State value is the page that t connects c requests, is expressed as y=y1,y2,...yT, state value space be S=1,
2,...,M}。
The parameter model of HTTP traffic behaviors is expressed as:θ={ π, A, B }, wherein, π is general for the original state of initial model
Rate, A is state transition probability, and B is observation probability.
The parameter estimation techniques of the 1.2 HTTP traffic behavior models based on forward-backward algorithm algorithm
HTTP traffic behavior model parameter estimation tasks are to estimate corresponding hidden half horse by the sequence of observations collected
The parameter of Er Kefu models.The parameter Estimation that the present invention solves HTTP traffic behavior models using famous forward-backward algorithm algorithm is asked
Topic, it is described in detail below.
1) forward-backward algorithm variable is defined:
αt(j)=P [St=j, o1:t|θ]
βt(j)=P [ot+1:T|St=j, θ]
2) initialization of forward-backward algorithm algorithm:
α1(j)=πj,
βT(j)=1.
3) iteration derivation:
4) intermediate variable is calculated:
ξt(i, j)=P [St=i, St+1=j, o1:T| λ]=αt(i)aijbj(ot+1)βt+1(j)
5) parameter more new formula
Wherein, o is worked ast=vkWhen, I (ot=vk)=1, otherwise I (ot=vk)=0.
The detection method of 1.3 web crawlers
HTTP flows include the HTTP flows of general user and the HTTP flows of web crawlers, detection of malicious web
Reptile flow, first has to the flow separation web crawlers flow and general user, is that this patent of the present invention passes through abnormality detection
Method recognize web crawlers flow.
Calculate the entropy of the HTTP traffic behavior model observation sequences of general user:
The standard variance for calculating the entropy of the observation sequence of the HTTP flows of general user is σ0, average is μ0,
When detecting web crawlers, the average that the entropy of monitoring data sequent is calculated first is μ, then with | μ-μ0| it is abnormality detection amount,
If | μ-μ0|≥3σ0, then it is abnormality.
The detection method of 1.4 hostile network reptiles
Calculate the entropy of well behaved net reptile behavior model observation sequence:
The standard variance for calculating the entropy of the observation sequence of well behaved net reptile is σ0, average is μ0,
During detection of malicious web reptile, the average that the entropy of monitoring data sequent is calculated first is μ, then with | μ-μ0| it is abnormality detection
Amount, if | μ-μ0|≥3σ0, then it is abnormality.
Brief description of the drawings
Hostile network reptile detection model schematic diagrames of the Fig. 1 based on hidden Markov model
Embodiment
Implementing procedure
Step 1:Training data is pre-processed, the training dataset of generation web crawlers detection;
Step 2:The parameter of model is estimated using forward-backward algorithm algorithm, the HTTP discharge models based on HMM are obtained;
Step 3:The entropy of monitoring data sequent is calculated using the model trained;
Step 4:Calculate Traffic anomaly detection amount | μ-μ0|;
Step 5:By judging | μ-μ0|≥3σ0Whether set up, to recognize web crawlers flow;
Step 6:Extract the training dataset of benign reptile detection;
Step 7:The model parameter of well behaved net reptile is estimated using forward-backward algorithm algorithm;
Step 8:Use the entropy of the well behaved net reptile model calculating network reptile sequence trained;
Step 9:Calculate abnormality detection amount | μ-μ0|;
Step 10:By judging | μ-μ0|≥3σ0Whether set up, to recognize hostile network reptile flow.
Claims (3)
1. a kind of hostile network reptile detection method based on hidden Markov model, it is characterized in that,
The construction method of HTTP traffic behavior models
1.1 basic definitions:
Observation:Using the resource type of HTTP request as observation, then the observation sequence of HTTP flows is expressed asWhereinRepresent to be connected to the resource type that t is asked c-th.Observation space is:V=1,
2,...,N};
State value is the page that t connects c requests, is expressed as y=y1,y2,...yT, state value space be S=1,2 ...,
M};
The parameter model of HTTP traffic behaviors is expressed as:θ={ π, A, B }, wherein, π is the initial state probabilities of initial model, A
For state transition probability, B is observation probability;
The parameter estimation techniques of the 1.2 HTTP traffic behavior models based on forward-backward algorithm algorithm
HTTP traffic behavior model parameter estimation tasks are to estimate corresponding hidden half Ma Erke by the sequence of observations collected
The parameter of husband's model;The present invention solves the Parameter Estimation Problem of HTTP traffic behavior models using famous forward-backward algorithm algorithm,
It is described in detail below;
1) forward-backward algorithm variable is defined:
αt(j)=P [St=j, o1:t|θ]
βt(j)=P [ot+1:T|St=j, θ]
2) initialization of forward-backward algorithm algorithm:
α1(j)=πj,
βT(j)=1;
3) iteration derivation:
4) intermediate variable is calculated:
ξt(i, j)=P [St=i, St+1=j, o1:T| λ]=αt(i)aijbj(ot+1)βt+1(j)
5) parameter more new formula
Wherein, o is worked ast=vkWhen, I (ot=vk)=1, otherwise I (ot=vk)=0;
The detection method of 1.3 web crawlers
HTTP flows include the HTTP flows of general user and the HTTP flows of web crawlers, detection of malicious web reptile
Flow, first has to the flow separation web crawlers flow and general user, is the side that this patent of the present invention passes through abnormality detection
Method recognizes web crawlers flow.
Calculate the entropy of the HTTP traffic behavior model observation sequences of general user:
The standard variance for calculating the entropy of the observation sequence of the HTTP flows of general user is σ0, average is μ0,
When detecting web crawlers, the average that the entropy of monitoring data sequent is calculated first is μ, then with | μ-μ0| it is abnormality detection amount, if |
μ-μ0|≥3σ0, then it is abnormality.
2. the hostile network reptile detection method according to claim 1 based on hidden Markov model, it is characterized in that, its
It is characterized in
The construction method of HTTP traffic behavior models
The detection method of 1.4 hostile network reptiles
Calculate the entropy of well behaved net reptile behavior model observation sequence:
The standard variance for calculating the entropy of the observation sequence of well behaved net reptile is σ0, average is μ0,
During detection of malicious web reptile, the average that the entropy of monitoring data sequent is calculated first is μ, then with | μ-μ0| it is abnormality detection amount, such as
Really | μ-μ0|≥3σ0, then it is abnormality.
3. the hostile network reptile detection method according to claim 1 based on hidden Markov model, it is characterized in that,
The construction method of HTTP traffic behavior models
Implementing procedure
Step 1:Training data is pre-processed, the training dataset of generation web crawlers detection;
Step 2:The parameter of model is estimated using forward-backward algorithm algorithm, the HTTP discharge models based on HMM are obtained;
Step 3:The entropy of monitoring data sequent is calculated using the model trained;
Step 4:Calculate Traffic anomaly detection amount | μ-μ0|;
Step 5:By judging | μ-μ0|≥3σ0Whether set up, to recognize web crawlers flow;
Step 6:Extract the training dataset of benign reptile detection;
Step 7:The model parameter of well behaved net reptile is estimated using forward-backward algorithm algorithm;
Step 8:Use the entropy of the well behaved net reptile model calculating network reptile sequence trained;
Step 9:Calculate abnormality detection amount | μ-μ0|;
Step 10:By judging | μ-μ0|≥3σ0Whether set up, to recognize hostile network reptile flow.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710281763.1A CN106961444A (en) | 2017-04-26 | 2017-04-26 | A kind of hostile network reptile detection method based on hidden Markov model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710281763.1A CN106961444A (en) | 2017-04-26 | 2017-04-26 | A kind of hostile network reptile detection method based on hidden Markov model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106961444A true CN106961444A (en) | 2017-07-18 |
Family
ID=59484570
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710281763.1A Pending CN106961444A (en) | 2017-04-26 | 2017-04-26 | A kind of hostile network reptile detection method based on hidden Markov model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106961444A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818179A (en) * | 2017-11-23 | 2018-03-20 | 成都知道创宇信息技术有限公司 | A kind of reptile recognition methods theoretical based on information content |
CN108536776A (en) * | 2018-03-28 | 2018-09-14 | 广州厚云信息科技有限公司 | Unification user malicious act detection method and system in a kind of social networks |
CN108900556A (en) * | 2018-08-24 | 2018-11-27 | 海南大学 | Ddos attack detection method based on HMM and chaotic model |
CN109525567A (en) * | 2018-11-01 | 2019-03-26 | 郑州云海信息技术有限公司 | A kind of detection method and system for implementing parameter injection attacks for website |
CN110245280A (en) * | 2019-05-06 | 2019-09-17 | 北京三快在线科技有限公司 | Identify method, apparatus, storage medium and the electronic equipment of web crawlers |
CN112398864A (en) * | 2020-11-19 | 2021-02-23 | 广东技术师范大学 | Vertical web crawler detection and identification method based on behavior balance degree |
US20210185086A1 (en) * | 2019-05-30 | 2021-06-17 | Morgan State University | Method and system for intrusion detection |
CN113281225A (en) * | 2021-04-30 | 2021-08-20 | 天津大学 | Method for identifying slug flow-mixed flow conversion boundary based on liquid film fluctuation analysis |
CN113868651A (en) * | 2021-09-27 | 2021-12-31 | 中国石油大学(华东) | Web log-based website anti-crawler method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020080162A1 (en) * | 2000-11-02 | 2002-06-27 | Hao Pan | Method for automatic extraction of semantically significant events from video |
CN1658576A (en) * | 2005-03-09 | 2005-08-24 | 中山大学 | Detection and defence method for data flous of large network station |
CN101022403A (en) * | 2006-09-08 | 2007-08-22 | 中山大学 | State application blind identifying method |
CN102438025A (en) * | 2012-01-10 | 2012-05-02 | 中山大学 | Indirect distributed denial of service attack defense method and system based on Web agency |
CN102999789A (en) * | 2012-11-19 | 2013-03-27 | 浙江工商大学 | Digital city safety precaution method based on semi-hidden-markov model |
-
2017
- 2017-04-26 CN CN201710281763.1A patent/CN106961444A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020080162A1 (en) * | 2000-11-02 | 2002-06-27 | Hao Pan | Method for automatic extraction of semantically significant events from video |
CN1658576A (en) * | 2005-03-09 | 2005-08-24 | 中山大学 | Detection and defence method for data flous of large network station |
CN101022403A (en) * | 2006-09-08 | 2007-08-22 | 中山大学 | State application blind identifying method |
CN102438025A (en) * | 2012-01-10 | 2012-05-02 | 中山大学 | Indirect distributed denial of service attack defense method and system based on Web agency |
CN102999789A (en) * | 2012-11-19 | 2013-03-27 | 浙江工商大学 | Digital city safety precaution method based on semi-hidden-markov model |
Non-Patent Citations (1)
Title |
---|
漆志辉: "基于隐马尔科夫模型的主题爬虫性能提高与应用", 《中国优秀硕士学位论文全文数据库.信息科技辑》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818179A (en) * | 2017-11-23 | 2018-03-20 | 成都知道创宇信息技术有限公司 | A kind of reptile recognition methods theoretical based on information content |
CN107818179B (en) * | 2017-11-23 | 2021-06-18 | 成都知道创宇信息技术有限公司 | Crawler identification method based on information quantity theory |
CN108536776A (en) * | 2018-03-28 | 2018-09-14 | 广州厚云信息科技有限公司 | Unification user malicious act detection method and system in a kind of social networks |
CN108900556A (en) * | 2018-08-24 | 2018-11-27 | 海南大学 | Ddos attack detection method based on HMM and chaotic model |
CN108900556B (en) * | 2018-08-24 | 2021-02-02 | 海南大学 | DDoS attack detection method based on HMM and chaotic model |
CN109525567A (en) * | 2018-11-01 | 2019-03-26 | 郑州云海信息技术有限公司 | A kind of detection method and system for implementing parameter injection attacks for website |
CN110245280A (en) * | 2019-05-06 | 2019-09-17 | 北京三快在线科技有限公司 | Identify method, apparatus, storage medium and the electronic equipment of web crawlers |
CN110245280B (en) * | 2019-05-06 | 2021-03-02 | 北京三快在线科技有限公司 | Method and device for identifying web crawler, storage medium and electronic equipment |
US20210185086A1 (en) * | 2019-05-30 | 2021-06-17 | Morgan State University | Method and system for intrusion detection |
US11595434B2 (en) * | 2019-05-30 | 2023-02-28 | Morgan State University | Method and system for intrusion detection |
CN112398864A (en) * | 2020-11-19 | 2021-02-23 | 广东技术师范大学 | Vertical web crawler detection and identification method based on behavior balance degree |
CN112398864B (en) * | 2020-11-19 | 2022-08-30 | 广东技术师范大学 | Vertical web crawler detection and identification method based on behavior balance degree |
CN113281225A (en) * | 2021-04-30 | 2021-08-20 | 天津大学 | Method for identifying slug flow-mixed flow conversion boundary based on liquid film fluctuation analysis |
CN113281225B (en) * | 2021-04-30 | 2022-06-14 | 天津大学 | Method for identifying slug flow-mixed flow conversion boundary based on liquid film fluctuation analysis |
CN113868651A (en) * | 2021-09-27 | 2021-12-31 | 中国石油大学(华东) | Web log-based website anti-crawler method |
CN113868651B (en) * | 2021-09-27 | 2024-04-26 | 中国石油大学(华东) | Web log-based website anticreeper method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106961444A (en) | A kind of hostile network reptile detection method based on hidden Markov model | |
Zhang et al. | A fuzzy probability Bayesian network approach for dynamic cybersecurity risk assessment in industrial control systems | |
CN110851835B (en) | Image model detection method and device, electronic equipment and storage medium | |
CN107154950B (en) | Method and system for detecting log stream abnormity | |
JP6106340B2 (en) | Log analysis device, attack detection device, attack detection method and program | |
WO2019128529A1 (en) | Url attack detection method and apparatus, and electronic device | |
TWI547823B (en) | Method and system for analyzing malicious code, data processing apparatus and electronic apparatus | |
JP6557774B2 (en) | Graph-based intrusion detection using process trace | |
WO2019127834A1 (en) | Transaction event processing method and device, terminal apparatus, and medium | |
CN104008332A (en) | Intrusion detection system based on Android platform | |
KR20160095856A (en) | System and method for detecting intrusion intelligently based on automatic detection of new attack type and update of attack type | |
CN101635658B (en) | Method and system for detecting abnormality of network secret stealing behavior | |
Garg et al. | Profiling users in GUI based systems for masquerade detection | |
JP2008546264A5 (en) | ||
CN104601556A (en) | Attack detection method and system for WEB | |
Shezan et al. | Read between the lines: An empirical measurement of sensitive applications of voice personal assistant systems | |
CN101686239A (en) | Trojan discovery system | |
Srivastav et al. | Novel intrusion detection system integrating layered framework with neural network | |
JP6174520B2 (en) | Malignant communication pattern detection device, malignant communication pattern detection method, and malignant communication pattern detection program | |
Traore et al. | Online risk-based authentication using behavioral biometrics | |
CN106973047A (en) | A kind of anomalous traffic detection method and device | |
CN103957205A (en) | Trojan horse detection method based on terminal traffic | |
RU2666644C1 (en) | System and method of identifying potentially hazardous devices at user interaction with bank services | |
CN103699823A (en) | Identity authentication system based on user behavior pattern and method thereof | |
CN114785567B (en) | Flow identification method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170718 |
|
WD01 | Invention patent application deemed withdrawn after publication |