CN106961444A - A kind of hostile network reptile detection method based on hidden Markov model - Google Patents

A kind of hostile network reptile detection method based on hidden Markov model Download PDF

Info

Publication number
CN106961444A
CN106961444A CN201710281763.1A CN201710281763A CN106961444A CN 106961444 A CN106961444 A CN 106961444A CN 201710281763 A CN201710281763 A CN 201710281763A CN 106961444 A CN106961444 A CN 106961444A
Authority
CN
China
Prior art keywords
reptile
http
model
detection method
entropy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710281763.1A
Other languages
Chinese (zh)
Inventor
罗日红
蔡君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Yi Rong Agel Ecommerce Ltd
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Yi Rong Agel Ecommerce Ltd
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Yi Rong Agel Ecommerce Ltd, Guangdong Polytechnic Normal University filed Critical Guangdong Yi Rong Agel Ecommerce Ltd
Priority to CN201710281763.1A priority Critical patent/CN106961444A/en
Publication of CN106961444A publication Critical patent/CN106961444A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention proposes a kind of hostile network reptile detection method based on hidden Markov model, belongs to the technical field of computer software.Hostile network reptile maliciously captures the sensitive information and private data of website in the case of without permission, while the traffic behavior pattern of its barbarous formula can also cause bad influence to the service quality of website.Existing network reptile detection method can not identify hostile network reptile exactly, and False Rate is higher.Therefore, the present invention proposes a kind of new hostile network reptile detection method based on hidden Markov model, specifically includes:(1)User's HTTP flowmeter factors based on HMM,(2)Web crawlers behavior modeling based on HTTP.

Description

A kind of hostile network reptile detection method based on hidden Markov model
Technical field
The invention belongs to computer software technical field.
Background technology
Benign web crawlers is the indispensable part of search engine.Well behaved net reptile can typically consider to take website The influence for quality of being engaged in, and strictly observe the data grabber rule of website.However, hostile network reptile is then to capture having for website For the purpose of information, do not take harmful effect of the crawl behavior to website into account, or even violate protection of the website to data and state, by force The sensitive information of website is captured, the adverse consequences such as privacy of user leakage and trade secret exposure is caused.Existing web crawlers is only Reptile flow and general user's flow can be distinguished, benign and hostile network reptile is but difficult to differentiate between.
The content of the invention
The purpose of the present invention is to propose to a kind of hostile network reptile detection method based on hidden Markov model.Malice net Network reptile maliciously captures the sensitive information and private data of website in the case of without permission, while the flow of its barbarous formula Behavior pattern can also cause bad influence to the service quality of website.Existing network reptile detection method can not be recognized exactly Go out hostile network reptile, False Rate is higher.Therefore, patent of the present invention proposes a kind of new malice based on hidden Markov model Web crawlers detection method.Specifically include:(1) user's HTTP flowmeter factors based on HMM, the web crawlers of (2) based on HTTP Behavior modeling.
Technical scheme is as follows:
1st, the construction method of HTTP traffic behaviors model
1.1 basic definitions:
Observation:Using the resource type of HTTP request as observation, then the observation sequence of HTTP flows is expressed asWhereinRepresent to be connected to the resource type that t is asked c-th.Observation space is:V=1, 2,...,N}。
State value is the page that t connects c requests, is expressed as y=y1,y2,...yT, state value space be S=1, 2,...,M}。
The parameter model of HTTP traffic behaviors is expressed as:θ={ π, A, B }, wherein, π is general for the original state of initial model Rate, A is state transition probability, and B is observation probability.
The parameter estimation techniques of the 1.2 HTTP traffic behavior models based on forward-backward algorithm algorithm
HTTP traffic behavior model parameter estimation tasks are to estimate corresponding hidden half horse by the sequence of observations collected The parameter of Er Kefu models.The parameter Estimation that the present invention solves HTTP traffic behavior models using famous forward-backward algorithm algorithm is asked Topic, it is described in detail below.
1) forward-backward algorithm variable is defined:
αt(j)=P [St=j, o1:t|θ]
βt(j)=P [ot+1:T|St=j, θ]
2) initialization of forward-backward algorithm algorithm:
α1(j)=πj,
βT(j)=1.
3) iteration derivation:
4) intermediate variable is calculated:
ξt(i, j)=P [St=i, St+1=j, o1:T| λ]=αt(i)aijbj(ot+1t+1(j)
5) parameter more new formula
Wherein, o is worked ast=vkWhen, I (ot=vk)=1, otherwise I (ot=vk)=0.
The detection method of 1.3 web crawlers
HTTP flows include the HTTP flows of general user and the HTTP flows of web crawlers, detection of malicious web Reptile flow, first has to the flow separation web crawlers flow and general user, is that this patent of the present invention passes through abnormality detection Method recognize web crawlers flow.
Calculate the entropy of the HTTP traffic behavior model observation sequences of general user:
The standard variance for calculating the entropy of the observation sequence of the HTTP flows of general user is σ0, average is μ0,
When detecting web crawlers, the average that the entropy of monitoring data sequent is calculated first is μ, then with | μ-μ0| it is abnormality detection amount, If | μ-μ0|≥3σ0, then it is abnormality.
The detection method of 1.4 hostile network reptiles
Calculate the entropy of well behaved net reptile behavior model observation sequence:
The standard variance for calculating the entropy of the observation sequence of well behaved net reptile is σ0, average is μ0,
During detection of malicious web reptile, the average that the entropy of monitoring data sequent is calculated first is μ, then with | μ-μ0| it is abnormality detection Amount, if | μ-μ0|≥3σ0, then it is abnormality.
Brief description of the drawings
Hostile network reptile detection model schematic diagrames of the Fig. 1 based on hidden Markov model
Embodiment
Implementing procedure
Step 1:Training data is pre-processed, the training dataset of generation web crawlers detection;
Step 2:The parameter of model is estimated using forward-backward algorithm algorithm, the HTTP discharge models based on HMM are obtained;
Step 3:The entropy of monitoring data sequent is calculated using the model trained;
Step 4:Calculate Traffic anomaly detection amount | μ-μ0|;
Step 5:By judging | μ-μ0|≥3σ0Whether set up, to recognize web crawlers flow;
Step 6:Extract the training dataset of benign reptile detection;
Step 7:The model parameter of well behaved net reptile is estimated using forward-backward algorithm algorithm;
Step 8:Use the entropy of the well behaved net reptile model calculating network reptile sequence trained;
Step 9:Calculate abnormality detection amount | μ-μ0|;
Step 10:By judging | μ-μ0|≥3σ0Whether set up, to recognize hostile network reptile flow.

Claims (3)

1. a kind of hostile network reptile detection method based on hidden Markov model, it is characterized in that,
The construction method of HTTP traffic behavior models
1.1 basic definitions:
Observation:Using the resource type of HTTP request as observation, then the observation sequence of HTTP flows is expressed asWhereinRepresent to be connected to the resource type that t is asked c-th.Observation space is:V=1, 2,...,N};
State value is the page that t connects c requests, is expressed as y=y1,y2,...yT, state value space be S=1,2 ..., M};
The parameter model of HTTP traffic behaviors is expressed as:θ={ π, A, B }, wherein, π is the initial state probabilities of initial model, A For state transition probability, B is observation probability;
The parameter estimation techniques of the 1.2 HTTP traffic behavior models based on forward-backward algorithm algorithm
HTTP traffic behavior model parameter estimation tasks are to estimate corresponding hidden half Ma Erke by the sequence of observations collected The parameter of husband's model;The present invention solves the Parameter Estimation Problem of HTTP traffic behavior models using famous forward-backward algorithm algorithm, It is described in detail below;
1) forward-backward algorithm variable is defined:
αt(j)=P [St=j, o1:t|θ]
βt(j)=P [ot+1:T|St=j, θ]
2) initialization of forward-backward algorithm algorithm:
α1(j)=πj,
βT(j)=1;
3) iteration derivation:
4) intermediate variable is calculated:
ξt(i, j)=P [St=i, St+1=j, o1:T| λ]=αt(i)aijbj(ot+1t+1(j)
5) parameter more new formula
Wherein, o is worked ast=vkWhen, I (ot=vk)=1, otherwise I (ot=vk)=0;
The detection method of 1.3 web crawlers
HTTP flows include the HTTP flows of general user and the HTTP flows of web crawlers, detection of malicious web reptile Flow, first has to the flow separation web crawlers flow and general user, is the side that this patent of the present invention passes through abnormality detection Method recognizes web crawlers flow.
Calculate the entropy of the HTTP traffic behavior model observation sequences of general user:
The standard variance for calculating the entropy of the observation sequence of the HTTP flows of general user is σ0, average is μ0,
When detecting web crawlers, the average that the entropy of monitoring data sequent is calculated first is μ, then with | μ-μ0| it is abnormality detection amount, if | μ-μ0|≥3σ0, then it is abnormality.
2. the hostile network reptile detection method according to claim 1 based on hidden Markov model, it is characterized in that, its It is characterized in
The construction method of HTTP traffic behavior models
The detection method of 1.4 hostile network reptiles
Calculate the entropy of well behaved net reptile behavior model observation sequence:
The standard variance for calculating the entropy of the observation sequence of well behaved net reptile is σ0, average is μ0,
During detection of malicious web reptile, the average that the entropy of monitoring data sequent is calculated first is μ, then with | μ-μ0| it is abnormality detection amount, such as Really | μ-μ0|≥3σ0, then it is abnormality.
3. the hostile network reptile detection method according to claim 1 based on hidden Markov model, it is characterized in that, The construction method of HTTP traffic behavior models
Implementing procedure
Step 1:Training data is pre-processed, the training dataset of generation web crawlers detection;
Step 2:The parameter of model is estimated using forward-backward algorithm algorithm, the HTTP discharge models based on HMM are obtained;
Step 3:The entropy of monitoring data sequent is calculated using the model trained;
Step 4:Calculate Traffic anomaly detection amount | μ-μ0|;
Step 5:By judging | μ-μ0|≥3σ0Whether set up, to recognize web crawlers flow;
Step 6:Extract the training dataset of benign reptile detection;
Step 7:The model parameter of well behaved net reptile is estimated using forward-backward algorithm algorithm;
Step 8:Use the entropy of the well behaved net reptile model calculating network reptile sequence trained;
Step 9:Calculate abnormality detection amount | μ-μ0|;
Step 10:By judging | μ-μ0|≥3σ0Whether set up, to recognize hostile network reptile flow.
CN201710281763.1A 2017-04-26 2017-04-26 A kind of hostile network reptile detection method based on hidden Markov model Pending CN106961444A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710281763.1A CN106961444A (en) 2017-04-26 2017-04-26 A kind of hostile network reptile detection method based on hidden Markov model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710281763.1A CN106961444A (en) 2017-04-26 2017-04-26 A kind of hostile network reptile detection method based on hidden Markov model

Publications (1)

Publication Number Publication Date
CN106961444A true CN106961444A (en) 2017-07-18

Family

ID=59484570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710281763.1A Pending CN106961444A (en) 2017-04-26 2017-04-26 A kind of hostile network reptile detection method based on hidden Markov model

Country Status (1)

Country Link
CN (1) CN106961444A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818179A (en) * 2017-11-23 2018-03-20 成都知道创宇信息技术有限公司 A kind of reptile recognition methods theoretical based on information content
CN108536776A (en) * 2018-03-28 2018-09-14 广州厚云信息科技有限公司 Unification user malicious act detection method and system in a kind of social networks
CN108900556A (en) * 2018-08-24 2018-11-27 海南大学 Ddos attack detection method based on HMM and chaotic model
CN109525567A (en) * 2018-11-01 2019-03-26 郑州云海信息技术有限公司 A kind of detection method and system for implementing parameter injection attacks for website
CN110245280A (en) * 2019-05-06 2019-09-17 北京三快在线科技有限公司 Identify method, apparatus, storage medium and the electronic equipment of web crawlers
CN112398864A (en) * 2020-11-19 2021-02-23 广东技术师范大学 Vertical web crawler detection and identification method based on behavior balance degree
US20210185086A1 (en) * 2019-05-30 2021-06-17 Morgan State University Method and system for intrusion detection
CN113281225A (en) * 2021-04-30 2021-08-20 天津大学 Method for identifying slug flow-mixed flow conversion boundary based on liquid film fluctuation analysis
CN113868651A (en) * 2021-09-27 2021-12-31 中国石油大学(华东) Web log-based website anti-crawler method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020080162A1 (en) * 2000-11-02 2002-06-27 Hao Pan Method for automatic extraction of semantically significant events from video
CN1658576A (en) * 2005-03-09 2005-08-24 中山大学 Detection and defence method for data flous of large network station
CN101022403A (en) * 2006-09-08 2007-08-22 中山大学 State application blind identifying method
CN102438025A (en) * 2012-01-10 2012-05-02 中山大学 Indirect distributed denial of service attack defense method and system based on Web agency
CN102999789A (en) * 2012-11-19 2013-03-27 浙江工商大学 Digital city safety precaution method based on semi-hidden-markov model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020080162A1 (en) * 2000-11-02 2002-06-27 Hao Pan Method for automatic extraction of semantically significant events from video
CN1658576A (en) * 2005-03-09 2005-08-24 中山大学 Detection and defence method for data flous of large network station
CN101022403A (en) * 2006-09-08 2007-08-22 中山大学 State application blind identifying method
CN102438025A (en) * 2012-01-10 2012-05-02 中山大学 Indirect distributed denial of service attack defense method and system based on Web agency
CN102999789A (en) * 2012-11-19 2013-03-27 浙江工商大学 Digital city safety precaution method based on semi-hidden-markov model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
漆志辉: "基于隐马尔科夫模型的主题爬虫性能提高与应用", 《中国优秀硕士学位论文全文数据库.信息科技辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818179A (en) * 2017-11-23 2018-03-20 成都知道创宇信息技术有限公司 A kind of reptile recognition methods theoretical based on information content
CN107818179B (en) * 2017-11-23 2021-06-18 成都知道创宇信息技术有限公司 Crawler identification method based on information quantity theory
CN108536776A (en) * 2018-03-28 2018-09-14 广州厚云信息科技有限公司 Unification user malicious act detection method and system in a kind of social networks
CN108900556A (en) * 2018-08-24 2018-11-27 海南大学 Ddos attack detection method based on HMM and chaotic model
CN108900556B (en) * 2018-08-24 2021-02-02 海南大学 DDoS attack detection method based on HMM and chaotic model
CN109525567A (en) * 2018-11-01 2019-03-26 郑州云海信息技术有限公司 A kind of detection method and system for implementing parameter injection attacks for website
CN110245280A (en) * 2019-05-06 2019-09-17 北京三快在线科技有限公司 Identify method, apparatus, storage medium and the electronic equipment of web crawlers
CN110245280B (en) * 2019-05-06 2021-03-02 北京三快在线科技有限公司 Method and device for identifying web crawler, storage medium and electronic equipment
US20210185086A1 (en) * 2019-05-30 2021-06-17 Morgan State University Method and system for intrusion detection
US11595434B2 (en) * 2019-05-30 2023-02-28 Morgan State University Method and system for intrusion detection
CN112398864A (en) * 2020-11-19 2021-02-23 广东技术师范大学 Vertical web crawler detection and identification method based on behavior balance degree
CN112398864B (en) * 2020-11-19 2022-08-30 广东技术师范大学 Vertical web crawler detection and identification method based on behavior balance degree
CN113281225A (en) * 2021-04-30 2021-08-20 天津大学 Method for identifying slug flow-mixed flow conversion boundary based on liquid film fluctuation analysis
CN113281225B (en) * 2021-04-30 2022-06-14 天津大学 Method for identifying slug flow-mixed flow conversion boundary based on liquid film fluctuation analysis
CN113868651A (en) * 2021-09-27 2021-12-31 中国石油大学(华东) Web log-based website anti-crawler method
CN113868651B (en) * 2021-09-27 2024-04-26 中国石油大学(华东) Web log-based website anticreeper method

Similar Documents

Publication Publication Date Title
CN106961444A (en) A kind of hostile network reptile detection method based on hidden Markov model
Zhang et al. A fuzzy probability Bayesian network approach for dynamic cybersecurity risk assessment in industrial control systems
CN110851835B (en) Image model detection method and device, electronic equipment and storage medium
CN107154950B (en) Method and system for detecting log stream abnormity
JP6106340B2 (en) Log analysis device, attack detection device, attack detection method and program
WO2019128529A1 (en) Url attack detection method and apparatus, and electronic device
TWI547823B (en) Method and system for analyzing malicious code, data processing apparatus and electronic apparatus
JP6557774B2 (en) Graph-based intrusion detection using process trace
WO2019127834A1 (en) Transaction event processing method and device, terminal apparatus, and medium
CN104008332A (en) Intrusion detection system based on Android platform
KR20160095856A (en) System and method for detecting intrusion intelligently based on automatic detection of new attack type and update of attack type
CN101635658B (en) Method and system for detecting abnormality of network secret stealing behavior
Garg et al. Profiling users in GUI based systems for masquerade detection
JP2008546264A5 (en)
CN104601556A (en) Attack detection method and system for WEB
Shezan et al. Read between the lines: An empirical measurement of sensitive applications of voice personal assistant systems
CN101686239A (en) Trojan discovery system
Srivastav et al. Novel intrusion detection system integrating layered framework with neural network
JP6174520B2 (en) Malignant communication pattern detection device, malignant communication pattern detection method, and malignant communication pattern detection program
Traore et al. Online risk-based authentication using behavioral biometrics
CN106973047A (en) A kind of anomalous traffic detection method and device
CN103957205A (en) Trojan horse detection method based on terminal traffic
RU2666644C1 (en) System and method of identifying potentially hazardous devices at user interaction with bank services
CN103699823A (en) Identity authentication system based on user behavior pattern and method thereof
CN114785567B (en) Flow identification method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170718

WD01 Invention patent application deemed withdrawn after publication