CN103440454A - Search engine keyword-based active honeypot detection method - Google Patents

Search engine keyword-based active honeypot detection method Download PDF

Info

Publication number
CN103440454A
CN103440454A CN2013103327307A CN201310332730A CN103440454A CN 103440454 A CN103440454 A CN 103440454A CN 2013103327307 A CN2013103327307 A CN 2013103327307A CN 201310332730 A CN201310332730 A CN 201310332730A CN 103440454 A CN103440454 A CN 103440454A
Authority
CN
China
Prior art keywords
search engine
honeypot
malicious
engine
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013103327307A
Other languages
Chinese (zh)
Other versions
CN103440454B (en
Inventor
邹福泰
白巍
潘道欣
易平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201310332730.7A priority Critical patent/CN103440454B/en
Publication of CN103440454A publication Critical patent/CN103440454A/en
Application granted granted Critical
Publication of CN103440454B publication Critical patent/CN103440454B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a search engine keyword-based active honeypot detection method. Firstly, Corresponding honeypot Web pages are automatically constructed by utilizing a known malicious search engine keyword library; for a malicious search engine keyword aimed at a URL (Uniform Resource Locator) path, the address rewriting technique of the Appache HTTP Server engine is utilized to construct a corresponding honeypot Web page; for a malicious search engine keyword aimed at Web page contents, the keyword is reinputted into a search engine, and a returned Web page result is used as a honeypot Web page. Secondly, the honeypot Web pages are recorded into the search engine. Finally, according to the malicious access reports of the honeypot Web pages, a data mining algorithm is adopted to extract new malicious search engine keywords, moreover, the new malicious search engine keywords are merged into the malicious search engine keyword library, and new honeypot Web pages are constructed. The method greatly increases the detection efficiency of honeypots, and overcomes the defect of the passivity of conventional honeypots; and moreover, honeypot Web pages can be dynamically updated in order to acquire up-to-date hacker-attackable vulnerability information.

Description

A kind of active honey jar detection method based on search engine keywords
Technical field
The present invention relates to a kind of active honey jar detection method, relate in particular to a kind of active honey jar detection method based on search engine keywords.
Background technology
Assault on the basis of having found system or some leak of network, always constantly produces the new attack method for new leak often.In order to test new leak and attack method, the hacker often will utilize search engine to search on the internet the website that may have certain leak, and it is attacked.Also have the hacker for certain leak, write out the instrument of certain specific scanning and automatic invasion, by search engine, all websites that may have this leak on internet are scanned on a large scale and invaded.These several years, utilize the assault of search engine to become a kind of important assault means.
Honey jar is a deception system that comprises leak, and it is special in attracting and inveigling those hackers to design, and by simulating one or more pregnable main frames, to the hacker, provides one to hold pregnable target.Because honey jar does not provide real valuable service to the external world, so that all trials to honey jar all are regarded as is suspicious.Another purposes of honey jar be in delay attacking to the attack of real target, allow the hacker lose time on honey jar.
Honey jar is divided into real system honey jar and antiforge system honey jar.Real system honey jar is real honey jar, and it is moving real system, and, with the leak that truly can invade, this leak belongs to the most dangerous leak; And the invasion information that it is recorded is the most real.The antiforge system honey jar is equally also to be based upon on the basis of real system, and it utilizes the powerful ability to model of some implementing procedures, and puppet is produced not one's own leak.Invade such leak, just spin in a program frame.Honey jar can at utmost prevent that the invader from destroying, and also can simulate non-existent leak with the fascination hacker.
If can simulate corresponding honey jar according to hacker's searched key word, be deployed on internet, and allow well-known search engine search, in conjunction with the search engine algorithms optimisation technique, honey jar is represented to the hacker, lure assault, can reach the purpose that initiatively attracts assault, greatly promote the detection effect of honey jar, and can also constantly update honey jar according to the new hacker's searched key word occurred every day like this, the purpose that the content of assurance honey jar is synchronizeed with hacker's attack means.
Therefore, those skilled in the art is devoted to develop a kind of active honey jar detection method based on search engine keywords, with the shortcoming of the passive wait that makes up traditional honey jar, and better initiatively lures assault, constantly update the content of honey jar, it is synchronizeed with up-to-date hacking technique.
Summary of the invention
Because the above-mentioned defect of prior art, technical matters to be solved by this invention is to provide a kind of active wooden pipe detection method based on search engine keywords.
For achieving the above object, the invention provides a kind of active honey jar detection method based on search engine keywords, it is characterized in that, comprise the following steps:
The malicious searches engine keywords database that step (101) utilization had been collected and had been identified is constructed honeypot webpage automatically;
Step (102) is indexed to search engine by search engine rank optimisation technique by the described honeypot webpage of structure, and improves the rank of described honeypot webpage in described search engine initiatively to attract the hacker to access;
Step (103) usage data mining algorithm from the Visitor Logs of described honeypot webpage extracts new malicious searches engine keyword, and the described new malicious searches engine keyword extracted is incorporated to described malicious searches engine keywords database, step (101) is returned in redirect again.
Further, in described step (101), described malicious searches engine keywords database comprises for the malicious searches engine keyword in URL path with for the malicious searches engine keyword of web page contents.
Further, for the described malicious searches engine keyword for the URL path, described honeypot webpage adopts the address rewrite technology to construct.
Further, wherein, described address rewrite utilization Apache HTTP Server engine.
Further, for the described malicious searches engine keyword for web page contents, again search for described malicious searches engine keyword on search engine, the webpage searched out is processed rear as described honeypot webpage.
Further, described search engine rank optimisation technique comprises the high prestige domain name of registration, increases link and optimizing webpage content.
Further, described step (103) also comprises normal access and the malicious attack of distinguishing in described web page access record.
Further, the Visitor Logs of described honeypot webpage is divided into engine reptile, described normal access and described malicious attack; Wherein said engine reptile belongs to described malicious attack equally.
Further, in described step (103), using all http response codes, be not also that 200 access is all as described malicious attack.
Further, described data mining algorithm is to extract described new malicious searches engine keyword by HTTP Referrer information.
In better embodiment of the present invention, at first by the popular malicious searches engine keywords database of nearest network of having identified, adopt two kinds of method constructs to go out the Virtual honeypot webpage: for the malicious searches engine keyword for the URL path, to utilize Apache HTTP Server engine to adopt address rewrite technical construction honeypot webpage; For the malicious searches engine keyword for web page contents, again inquire about these keywords on search engine, the webpage returned is processed rear as corresponding honeypot webpage.Secondly, the searched engine index of honeypot webpage that allows these simulate out by search engine rank optimisation technique, and improve their rank, attract on one's own initiative the hacker.Last usage data mining algorithm is distinguished different malicious attack and normal record of accessing in flow, thereby analyze purpose and the step of the up-to-date attack of hacker, and extract the malicious searches engine keyword that makes new advances, and new malicious searches engine keyword is incorporated to malicious searches engine keywords database, to dynamically update malicious searches engine keyword, then can dynamically update honeypot webpage according to the malicious searches engine keywords database dynamically updated, run so forth.
A kind of active honey jar detection method based on search engine keywords of the present invention, by attracting on one's own initiative assault, promote the detection efficiency of honey jar greatly, makes up the passivity shortcoming of traditional honey jar; And the inventive method adopts and dynamically updates honeypot webpage, obtains up-to-date assault vulnerability information, with the malicious attack behavior in the mining data flow, analyze the feature of the up-to-date attack of hacker.
Technique effect below with reference to accompanying drawing to design of the present invention, concrete structure and generation is described further, to understand fully purpose of the present invention, feature and effect.
The accompanying drawing explanation
Fig. 1 is the process flow diagram of a kind of active honey jar detection method based on search engine keywords of the present invention;
Fig. 2 is the honey pot system Organization Chart of a kind of active honey jar detection method based on search engine keywords of the present invention.
Embodiment
Below in conjunction with accompanying drawing, embodiments of the invention are elaborated: the present embodiment is implemented under with the technical solution of the present invention prerequisite, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
In the present embodiment, as shown in Figure 1, a kind of active honey jar detection method based on search engine keywords of the present invention comprises the following steps:
Step 101: utilize the malicious searches engine keywords database of having collected and being identified, automatically construct honeypot webpage, as shown in Figure 2:
The malicious searches engine keywords database that Classification and Identification has searched and has been identified: malicious searches engine keyword is divided into for web page address URL(UniformResourceLocator) the malicious searches engine keyword in path and for the malicious searches engine keyword of web page contents.
So, to the malicious searches engine keyword for the URL path, utilize the search engine of Apache HTTP Server to adopt URL Rewrite technology, i.e. address rewrite technical construction honeypot webpage.URL Rewrite is redirecting of address namely, and URL Rewrite technology is user's request that intercepting is imported into, and automatically this request is redirected to the process of other resources.The working method of server when the processing user asks do not change, and just increased the processing procedure that request is redirected.In the present invention, URL Rewrite technology, according to the malicious searches engine keyword in existing URL path, redirects to corresponding honeypot webpage to it.
For the malicious searches engine keyword for web page contents, again inquire about these malicious searches engine keywords, and leave the web page contents of Search Results in local Apache WEB server as corresponding honeypot webpage.
Step 102: by search engine rank optimisation technique, the described honeypot webpage of structure is indexed to search engine: improve the rank of described honeypot webpage in described search engine initiatively to attract the hacker to access.Utilize the high prestige domain name of registration, increase the search engine optimization technology such as link and optimizing webpage content and make the searched engine index of honeypot webpage, and further promote its rank, enable better to attract the hacker.
Step 103: adopt following algorithm and extract malicious searches engine keyword from the web page access record:
The first step, normal access and the malicious attack of webpage record are distinguished: after the searched engine of honeypot webpage is included, the hacker can arrive these honeypot webpages by the keyword search of malicious searches engine, and it is attacked.Record all access to honeypot webpage, and therefrom excavate hacker's attack.Because being linked in website of honeypot webpage is all to hide link, the user can't see, but, for hacker's attack tool, such link is can be found.In the Visitor Logs of honey jar, as shown in Figure 2, except normal access, also comprise two classes access: attack 201 and malicious searches 203; In addition, utilize in addition the distinctive user agent of search engine (User Agent) and the source IP address identification reptile 202 from well-known search engine.Therefore, all situations except normal access for honeypot webpage will be identified as malicious attack.And, the phenomenon that may attempt accessing some system sensitive resource paths for the assailant, the access that is 200 due to the http response code is all normal, so malicious attack is all classified in the access that is not 200 by all http response codes as.
Second step, record for malicious attack records its attack source, using the attack source of record as database, utilize the attack source database initialize data mining model of data mining algorithm to record, then carry out classification analysis, and extract the malicious searches engine keyword made new advances: due to HTTP Referrer, it is address, HTTP source, be a field of HTTP gauge outfit, be used for meaning from where being linked to current webpage, the form of employing is URL.By HTTP Referrer, current webpage can check the visitor wherefrom; So, by HTTP Referrer information, can extract the hacker and access the new malicious searches engine keyword that honeypot webpage may use.
After having extracted new malicious searches engine keyword, the new malicious searches engine keyword extracted is added in the malice keywords database, jump to step 101, re-construct new honeypot webpage, to reach the purpose that dynamically updates honeypot webpage.
More than describe preferred embodiment of the present invention in detail.The ordinary skill that should be appreciated that this area just can design according to the present invention be made many modifications and variations without creative work.Therefore, all technician in the art, all should be in the determined protection domain by claims under this invention's idea on the basis of existing technology by the available technical scheme of logical analysis, reasoning, or a limited experiment.

Claims (10)

1. the active honey jar detection method based on search engine keywords, is characterized in that, comprises the following steps:
The malicious searches engine keywords database that step (101) utilization had been collected and had been identified is constructed honeypot webpage automatically;
Step (102) is indexed to search engine by search engine rank optimisation technique by the described honeypot webpage of structure, and improves the rank of described honeypot webpage in described search engine initiatively to attract the hacker to access;
Step (103) usage data mining algorithm from the Visitor Logs of described honeypot webpage extracts new malicious searches engine keyword, and the described new malicious searches engine keyword extracted is incorporated to described malicious searches engine keywords database, step (101) is returned in redirect again.
2. a kind of active honey jar detection method based on search engine keywords as claimed in claim 1, wherein, in described step (101), described malicious searches engine keywords database comprises for the malicious searches engine keyword in URL path with for the malicious searches engine keyword of web page contents.
3. a kind of active honey jar detection method based on search engine keywords as claimed in claim 2, wherein, for the described malicious searches engine keyword for the URL path, described honeypot webpage adopts the address rewrite technology to construct.
4. the active honey jar detection method of a kind of keyword based on search engine as claimed in claim 3, wherein, described address rewrite utilization be Appache HTTP Server engine.
5. a kind of active honey jar detection method based on search engine keywords as claimed in claim 2, wherein, for the described malicious searches engine keyword for web page contents, again search for described malicious searches engine keyword on search engine, the webpage searched out is processed rear as described honeypot webpage.
6. a kind of active honey jar detection method based on search engine keywords as claimed in claim 1, wherein, described search engine rank optimisation technique comprises the high prestige domain name of registration, increases link and optimizing webpage content.
7. a kind of active honey jar detection method based on search engine keywords as claimed in claim 1, wherein, described step (103) also comprises normal access and the malicious attack of distinguishing in described honeypot webpage Visitor Logs.
8. a kind of active honey jar detection method based on search engine keywords as claimed in claim 7, wherein, the Visitor Logs of described honeypot webpage is divided into engine reptile, described normal access and described malicious attack; Wherein said engine reptile belongs to described malicious attack equally.
9. a kind of active honey jar detection method based on search engine keywords as claimed in claim 7 wherein, in described step (103), is not also that 200 access is all as described malicious attack using all http response codes.
10. a kind of active honey jar detection method based on search engine keywords as claimed in claim 1, wherein, described data mining algorithm is to extract described new malicious searches engine keyword by HTTP Referrer information.
CN201310332730.7A 2013-08-01 2013-08-01 A kind of active honeypot detection method based on search engine keywords Expired - Fee Related CN103440454B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310332730.7A CN103440454B (en) 2013-08-01 2013-08-01 A kind of active honeypot detection method based on search engine keywords

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310332730.7A CN103440454B (en) 2013-08-01 2013-08-01 A kind of active honeypot detection method based on search engine keywords

Publications (2)

Publication Number Publication Date
CN103440454A true CN103440454A (en) 2013-12-11
CN103440454B CN103440454B (en) 2016-04-06

Family

ID=49694147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310332730.7A Expired - Fee Related CN103440454B (en) 2013-08-01 2013-08-01 A kind of active honeypot detection method based on search engine keywords

Country Status (1)

Country Link
CN (1) CN103440454B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978519A (en) * 2014-10-31 2015-10-14 哈尔滨安天科技股份有限公司 Implementation method and device of application-type honeypot
CN108229166A (en) * 2017-12-08 2018-06-29 重庆邮电大学 A kind of webpage Trojan horse detecting system and method searched for using leading type
CN110677414A (en) * 2019-09-27 2020-01-10 北京知道创宇信息技术股份有限公司 Network detection method and device, electronic equipment and computer readable storage medium
CN110971605A (en) * 2019-12-05 2020-04-07 福建天晴在线互动科技有限公司 Method for acquiring pirated game server information by capturing data packet
CN111917691A (en) * 2019-05-10 2020-11-10 张长河 WEB dynamic self-adaptive defense system and method based on false response

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567118A (en) * 2004-03-29 2005-01-19 四川大学 Computer viruses detection and identification system and method
CN101060444A (en) * 2007-05-23 2007-10-24 西安交大捷普网络科技有限公司 Bayesian statistical model based network anomaly detection method
CN102571484A (en) * 2011-12-14 2012-07-11 上海交通大学 Method for detecting and finding online water army

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567118A (en) * 2004-03-29 2005-01-19 四川大学 Computer viruses detection and identification system and method
CN101060444A (en) * 2007-05-23 2007-10-24 西安交大捷普网络科技有限公司 Bayesian statistical model based network anomaly detection method
CN102571484A (en) * 2011-12-14 2012-07-11 上海交通大学 Method for detecting and finding online water army

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周佩颖: "恶意的URL捕获分析系统", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》, 15 April 2011 (2011-04-15), pages 139 - 133 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978519A (en) * 2014-10-31 2015-10-14 哈尔滨安天科技股份有限公司 Implementation method and device of application-type honeypot
CN108229166A (en) * 2017-12-08 2018-06-29 重庆邮电大学 A kind of webpage Trojan horse detecting system and method searched for using leading type
CN111917691A (en) * 2019-05-10 2020-11-10 张长河 WEB dynamic self-adaptive defense system and method based on false response
CN110677414A (en) * 2019-09-27 2020-01-10 北京知道创宇信息技术股份有限公司 Network detection method and device, electronic equipment and computer readable storage medium
CN110971605A (en) * 2019-12-05 2020-04-07 福建天晴在线互动科技有限公司 Method for acquiring pirated game server information by capturing data packet
CN110971605B (en) * 2019-12-05 2022-03-08 福建天晴在线互动科技有限公司 Method for acquiring pirated game server information by capturing data packet

Also Published As

Publication number Publication date
CN103440454B (en) 2016-04-06

Similar Documents

Publication Publication Date Title
US9723018B2 (en) System and method of analyzing web content
US7873635B2 (en) Search ranger system and double-funnel model for search spam analyses and browser protection
CN103559235B (en) A kind of online social networks malicious web pages detection recognition methods
US9430577B2 (en) Search ranger system and double-funnel model for search spam analyses and browser protection
CN103279710B (en) Method and system for detecting malicious codes of Internet information system
CN107688743B (en) Malicious program detection and analysis method and system
CN105491053A (en) Web malicious code detection method and system
US20150128272A1 (en) System and method for finding phishing website
Kim et al. Detecting fake anti-virus software distribution webpages
CN107437026B (en) Malicious webpage advertisement detection method based on advertisement network topology
CN111104579A (en) Identification method and device for public network assets and storage medium
CN103440454B (en) A kind of active honeypot detection method based on search engine keywords
CN105376217B (en) A kind of malice jumps and the automatic judging method of malice nested class objectionable website
US20200336498A1 (en) Method and apparatus for detecting hidden link in website
CN111859234A (en) Illegal content identification method and device, electronic equipment and storage medium
CN113454621A (en) Method, apparatus and computer program for collecting data from multiple domains
WO2020211130A1 (en) Hidden link detection method and apparatus for website
CN106250761B (en) Equipment, device and method for identifying web automation tool
CN109756467B (en) Phishing website identification method and device
Sun et al. AutoBLG: Automatic URL blacklist generator using search space expansion and filters
Shyni et al. Phishing detection in websites using parse tree validation
Brites et al. Phishfry-a proactive approach to classify phishing sites using scikit learn
Zeng et al. Hidden path: Understanding the intermediary in malicious redirections
CN104008339A (en) Active technology based malicious code capture method
Wang et al. Minedetector: Javascript browser-side cryptomining detection using static methods

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent of invention or patent application
CB03 Change of inventor or designer information

Inventor after: Zou Futai

Inventor after: Bai Wei

Inventor after: Wang Jiahui

Inventor after: Pan Daoxin

Inventor after: Yi Ping

Inventor before: Zou Futai

Inventor before: Bai Wei

Inventor before: Pan Daoxin

Inventor before: Yi Ping

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: ZOU FUTAI BAI WEI PAN DAOXIN YI PING TO: ZOU FUTAI BAI WEI WANG JIAHUI PAN DAOXIN YI PING

C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160406

Termination date: 20180801

CF01 Termination of patent right due to non-payment of annual fee