CN103559235B - A kind of online social networks malicious web pages detection recognition methods - Google Patents

A kind of online social networks malicious web pages detection recognition methods Download PDF

Info

Publication number
CN103559235B
CN103559235B CN201310507897.2A CN201310507897A CN103559235B CN 103559235 B CN103559235 B CN 103559235B CN 201310507897 A CN201310507897 A CN 201310507897A CN 103559235 B CN103559235 B CN 103559235B
Authority
CN
China
Prior art keywords
webpage
web pages
social networks
online social
url
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310507897.2A
Other languages
Chinese (zh)
Other versions
CN103559235A (en
Inventor
李沁蕾
王蕊
贾晓启
张道娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201310507897.2A priority Critical patent/CN103559235B/en
Publication of CN103559235A publication Critical patent/CN103559235A/en
Application granted granted Critical
Publication of CN103559235B publication Critical patent/CN103559235B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2119Authenticating web pages, e.g. with suspicious links

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Virology (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention relates to a kind of online social networks malicious web pages detection recognition methods, step is: 1) to the webpage of any one identification to be detected in online social networks, adds up the frequency of occurrences of all keywords in this webpage;According to source code in webpage, webpage is divided into: html tag set or JavaScript gather or one or more different types of set in set of URL conjunction;2) do not distinguish that obscuring character obtains the Relating Characteristic of webpage for empty set is extracted from above-mentioned;3) create the Relating Characteristic of webpage in related information data base real-time update data base, extract according to Relating Characteristic and obtain webpage spread speed;4) according to page spread speed, and combine statistics obtain features described above detection identify malicious web pages.The present invention not only have good universality can the feature of accurate description online social networks malicious web pages, and more accurate, in hgher efficiency to the detection identification of malicious web pages, analysis cost is lower.

Description

A kind of online social networks malicious web pages detection recognition methods
Technical field
The invention belongs to technical field of network security, relate to a kind of online social networks malicious web pages recognition methods, particularly to base Online social networks malicious web pages recognition methods in malicious web pages feature extraction.
Background technology
Flourish along with online social networks (Online Social Network, OSN), each big online social network-i i-platform Have huge customer volume, add its user's private information hidden and potential economic interests so that it is become increasingly The focus of Multi net voting hackers.In the attack for online social networks, cross-site scripting attack (Cross-site Scripting, XSS) it is one of a kind of common attack pattern with destructive power, utilizes the network worm that cross site scripting leak produces, permissible Infect the substantial amounts of network user at short notice, even have influence on the properly functioning of server.Therefore, effective webpage is extracted special Levying to improve the identification to online social networks malicious web pages is current problem demanding prompt solution.
Existing online social networks malicious web pages is analyzed and is mostly used complicated Static Analysis Method.Generally, at the source code of webpage In contain the elements such as HTML, CSS, URI, JavaScript, in webpage malice HTML, CSS, URI, JavaScript Webpage may be caused to produce the behavior of malice browser end loads when, such as, steal cookie, open fishing website etc.. In online social networks, user can input the content of certain length from the text box of webpage freely, including HTML, CSS, The codes such as URI, JavaScript, in order to avoid the malicious code that may comprise in user input content, in input frame in When holding submission, need it is carried out static analysis, can be respectively from the angle of HTML, CSS, URI, JavaScript, profit Judge whether these element structures and content may produce malicious act with formal methods analyst.
In malicious web pages, malicious code based on XSS leak is modal a kind of web virus, for this type Malicious code had the analysis means of many maturations.Non-online social networks (such as: portal website, forum website etc.) Web page analysis during, cut from the angle of obfuscated codes, extract the feature of obfuscated codes in webpage, it is judged that whether webpage is deposited At suspicious malicious code.Extract feature specifically include that keyword, JavaScript feature (including length, character number etc.), URL feature etc..
In existing a series of online social networks malicious web pages analyses detection recognition methods, Static Analysis Method needs multiple mostly Miscellaneous analytical procedure, processes the time long, and ageing the highest, compared with dynamic analysing method, Static Analysis Method should have Low time loss is withdrawn deposit the most completely, and the web-page requests delay that the analysis of complexity and calculating process cause also can be to network Application is negatively affected.Therefore, for online social networks malicious web pages, a kind of simple and effective feature extraction side is proposed Method, lowers analysis cost, is to need the problem researched and solved at present badly.
Summary of the invention
The problem identified for the detection of online social networks malicious web pages, it is an object of the invention to propose one based on online social The online social networks malicious web pages detection recognition methods of network malicious web pages feature extraction.The webpage of online social networks is being entered After row is analyzed, it is analyzed from following malicious web pages feature: keyword, JavaScript, HTML, URL and social activity online The angle extraction of network self-characteristic has the feature quantifying character, utilizes those malicious web pages features extracted to online social network The malicious web pages with the malicious code of XSS leak in network is identified.
Technical scheme is as follows: a kind of online social networks malicious web pages detection recognition methods, and its step includes:
1) to the webpage of any one identification to be detected in online social networks, the frequency of occurrences of all keywords in this webpage is added up; According to source code in described webpage, webpage is divided into: html tag set or JavaScript set or set of URL in closing a kind of or The set of person's number of different types;
2) do not distinguish that the webpage static nature obscuring character obtains suspicious field for empty set is extracted from above-mentioned, combine described in can The time doubting field appearance obtains the Relating Characteristic of webpage;
3) association of related information data base webpage in the Relating Characteristic storing this webpage real-time update data base is created Property feature, according to described Relating Characteristic extract obtain webpage spread speed;
4) according to described page spread speed, and combine statistics obtain keyword the frequency of occurrences, detection obtain suspicious In JavaScript script, suspicious html tag, suspicious URL, one or more feature detection identify malicious web pages.
Further, from webpage, source code takes out and meets the code segment of html tag and collect into html tag set, described Html tag is by starting label and/or end-tag forms, the masurium that described beginning label is surrounded by bracket, end-tag The brace surrounded by bracket and masurium.
Further, the position that in webpage, the JavaScript script of source code occurs in is:<script></script>between label Or at " javascript: after ";Occur that JavaScript script is taken out in position according to described script, collect into set.
Further, from webpage source code take out search with HTTP, HTTPS, one section of the entitled beginning of File Transfer Protocol have Effect character string separation and Extraction obtains set of URL after going out URL and closes.
Further, described html tag set extraction is distinguished that the webpage static nature method obscuring character is as follows:
In statistics html tag set, the information of all labels, extracts the greatest length of label, the number of long label in set, with And the ratio of contained JavaScript character string in label, the metering of degree is obscured as html tag.
Further, described JavaScript set extraction is distinguished that the webpage static nature method obscuring character is as follows:
In statistics JavaScript set, the information of all scripts, extracts the greatest length of script character string, script character in set String is encoded character ratio and set in the number of times that character string connects occurs, obscure degree as JavaScript script Metering.
Further, described set of URL is closed extraction and distinguishes that the webpage static nature method obscuring character is as follows:
The information of all URL in statistics set, extracts the greatest length of URL, the number of long URL in set, and URL The ratio of middle code character, obscures the metering of degree as URL.
Further, described keyword is to have the frequency occurred in optimum script and malicious script to there is diversity JavaScript function or html tag, including: eval, document.write, unescape, fromCharCode, createElement,createTextNode。
Further, described spread speed is: the frequency that in the unit interval, suspected malicious code occurs in webpage, calculates suspicious Character string step of spread speed in webpage is as follows:
1)<time that suspicious field occurs, suspicious string content>record of statistical web page in related information data base;
2) identical by string content in inquiry data base, and the time at t with the number of previous hour interior all records, system Count the spread speed of string content in each record;
3) maximum in all spread speeds is recorded, as the spread speed of webpage.
Beneficial effects of the present invention:
1. the present invention extracts one group of online social networks malicious web pages feature, has good universality.
2. the present invention is based on structure of web page feature, and webpage is carried out pretreatment, from web page element type angle, extracts web page characteristics, Set up the related information data base between webpage simultaneously.
3. during the present invention has fully taken into account online social networks, the propagating characteristic of malicious code, extracts one group pair based on propagating characteristic Online social networks has feature targetedly.
To sum up, the online social networks malicious web pages detection recognition methods that the present invention proposes, it is possible to accurate description is the most social The feature of network malicious web pages, more accurate, in hgher efficiency to the detection identification of malicious web pages, analysis cost is lower.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of online social networks malicious web pages detection recognition methods.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely retouched State, it is to be understood that described embodiment is only a part of embodiment of the present invention rather than whole embodiments.Base Embodiment in the present invention, the every other enforcement that those skilled in the art are obtained under not making creative work premise Example, broadly falls into the scope of protection of the invention.
A kind of detailed description of the invention realizing the present invention is as follows, the detection recognition methods of online social networks malicious web pages, the steps include:
1) to the webpage of any one identification to be detected in online social networks, the appearance of all given keywords in this webpage is added up Frequency;
2) webpage is divided into different types of set according to web page source code by analyzing web page structure, and web page source code is resolved into HTML Set, JavaScript set, set of URL close;
3) from HTML set, JavaScript set, set of URL close, extraction distinguishes the webpage static nature obscuring character Doubting field, the time occurred in conjunction with suspicious field obtains the Relating Characteristic of webpage;
4) store the Relating Characteristic of this webpage, update related information data base, to update up-to-date Relating Characteristic;According to note Record the database information of online social networks webpage relevance feature, extract the Relating Characteristic of webpage;
5) extract according to relationship information and obtain webpage spread speed, and combine given keyword, suspicious JavaScript, suspicious HTML, suspicious URL, totally five features, obtain the characteristic vector of webpage;
6) malicious web pages is identified according to characteristic vector detection.
In one embodiment of this invention, keyword has referred to some JavaScript function or html tag, and they are good Property script and malicious script in occur frequency there is diversity.More such keywords, they occur in optimum webpage Number of times less and in malicious web pages occur frequency higher, it is believed that these fields can become the keyword in webpage, from And the frequency that in webpage, keyword occurs can be utilized to judge whether webpage is malice.
In one embodiment of this invention, obtaining the component of webpage according to the structure of analyzing web page, webpage is carried out pre-by we Processing, the target of process is that webpage is divided into different types of set.Due to we extract feature both from html tag, JavaScript script, URL, therefore, when webpage is carried out pretreatment, divide into html tag collection by the source code of webpage Conjunction, JavaScript script set and set of URL close, and in a subsequent step, we have only to respectively to these three set Extract relevant information, it is to avoid need to process substantial amounts of data and the process overlong time that causes every time, it addition, right Sorted set is analyzed extracting feature, and feature also can be made the most accurate.
In one embodiment of this invention, webpage static nature is extracted, according to being the feature having in webpage and obscuring character, to three groups Set, its extracting method is:
(1) to html tag set, in statistics set, the information of all labels, extracts the greatest length of label, length in set The number of label, and the ratio of contained JavaScript character string in label, these statistical values quantified can conduct Html tag obscures the metering of degree.
(2) to JavaScript script set, the information of all scripts in statistics set, extract the maximum of script character string in set Length, script character string are encoded in the ratio of character, and set the number of times that character string connects occur, this tittle The statistical value changed can obscure the metering of degree as JavaScript script.
(3) closing set of URL, in statistics set, the information of all URL, extracts the greatest length of URL, long URL in set Number, and the ratio of code character in URL, these statistical values quantified can obscure degree as URL Metering.
In one embodiment of this invention, according to the feature of online social networks, malicious code propagation in social networks is different Propagation in general networking, feature is the most intuitively, the high concentration class of social networks topology and less average beeline, Cause the malicious code spread speed in social networks far above the spread speed in general networking.In order to quantify the value of spread speed, Definition spread speed is in the present invention: in the unit interval, suspected malicious code occurs in the system of the frequency in webpage, i.e. speed Meter need to rely on the number of times that in the webpage that server end sent in nearest hour, suspected malicious code occurs.In order to extract Feature, it is to be appreciated that all webpages that in the unit interval in the past, it is detected, therefore creates a related information data base reality Time more new database in online social networks webpage relevance feature, from data base, the feature extracting needs can be added up.
In one embodiment of this invention, related information data base needs constantly to update, and data base needs to preserve all webpages Related information, therefore, after obtaining the Relating Characteristic of webpage, is saved in the relationship information of webpage in data base, updates Related information data base.Have only to reference to nearest one hour interior related information to improve renewal efficiency, front owning in a hour Information is actually not as reference, in order to improve the access efficiency of data base, within every ten minutes, safeguards a data base entries, By one hour front all information deletion.
It is the schematic flow sheet of line social networks malicious web pages detection recognition methods as shown in Figure 1, including step:
1. extracting part 1 web page characteristics, feature mainly includes key characteristics.
When malicious web pages loads in client browser, can carry out some aggressive behaviors, these behaviors are by a series of Combination of function performs realization.When static analysis front-page keyword, when utilizing the number of times of the appearance of keyword to replace dynamically analyzing The execution sequence of keyword is as the feature of keyword.Finding from statistical data, some script function, it possibly be present at institute In some webpages, but the frequency that they are used but differs widely.Keyword can include but not limited to: eval, Document.write, unescape, fromCharCode, createElement, createTextNode etc., this area is the brightest The white extraction the most how carrying out keyword for malicious web pages leak, so keyword is not limited by the type of above-mentioned keyword. As character string performs function eval, it can perform a code existed with character string forms, eval be one legal Function, it is present in various webpage, but the frequency that it typically occurs in webpage is relatively low.But, in malicious web pages, Therefore the number of times that eval occurs can extract the feature of of this sort keyword higher than the number of times generally occurred, can conduct A kind of sign identifying malicious web pages.
2. Web-page preprocessing, classifies webpage source code according to element type.
Having multiple element in webpage source code, most basic includes html tag, JavaScript script, URL etc..This One of bright point of penetration is to search the clues and traces that malicious code exists from html tag, JavaScript script, URL, The realization of method for convenience, before extracting other several Partial Feature, needs to carry out a Web-page preprocessing, obtains three after process Plant the set of element.
Preprocessing process is as follows:
1) html tag is one group and has cannonical format, label by starting label and end-tag forms, start label by The masurium that bracket surrounds, brace that end-tag is surrounded by bracket and masurium, some is likely not to have end-tag, as <br/>.From webpage source code, take out the code segment meeting html tag, collect into set.
2) in webpage, JavaScript script typically occurs in<script></script>between label, or at " javascript: " After.The position occurred according to script, analyzes webpage source code, is taken out JavaScript script, collects into set.
3) URL is all resources address on internet, and they are followed and there may be some in unified standard web page From this territory or the resource in other territories.The conventional agreement that is initially in a protocol name, and Internet of URL format is Limited, include HTTP, HTTPS, FTP etc., when collecting set of URL and closing, it is only necessary to search with protocol name for opening One section of valid string of head i.e. separation and Extraction can go out URL from webpage source code.
3. extracting part 2 feature, feature mainly includes html tag feature.
Html tag constitutes the structure of web page, and label can add dynamically and delete by script, additionally in label Attribute script can be utilized to revise (such as: value) dynamically, some can also perform (such as: src), therefore, HTML automatically Label becomes the good place that malicious script is concealed.General html tag limited length, if concealed in html tag Malicious script, then html tag length is likely larger than the length of optimum netpage tag.
4. extracting third portion feature, feature mainly includes JavaScript script feature.
XSS malicious code generally uses JavaScript script edit, in addition to the aggressivity of code, a lot of in the case of malicious code Script, in order to confuse victim, can be used some to obscure means, reduce the readability of program, it is to avoid victim discovers by maker. A kind of universal means of obscuring are to encode malicious code.Encoded shell script, length substantially increases, and character string The ratio of middle code character also will increase.
5. extracting the 4th Partial Feature, feature mainly includes URL feature.
When webpage exists reflection-type XSS, webpage source code can comprise malice URL causing XSS, these URL with Malicious script.Clicking on malice URL to confuse user, what malicious code maker can be had a mind to carried out URL processes and deforms, Making user cannot be distinguished by out the content of URL parameter part, user cheating is without the URL in webpage clicking under defence.
6. store webpage relevance information, update related information data base.
Set up a related information data base, the relationship information of in store webpage in data base.So-called relationship information, be Refer to that some suspicious fields in webpage are (such as suspicious JavaScript script character string, suspicious URL, suspicious HTML mark Sign), and the time that suspicious field occurs.Due to the speed needing statistics to propagate, and speed is directly related with the time, because of now Between be a significant field in data base.
When a webpage is extracted feature, after having carried out the process of first five step, can obtain some in one group of this webpage can Doubt character string, the spread speed calculating the webpage occurred afterwards in network flow for convenience, need to preserve suspicious word in this webpage The symbol string relationship information to webpage later, by<time that suspicious field occurs, the suspicious string content>such of webpage Group record is inserted in data base, it addition, for the efficiency improving database work, within every ten minutes, the content to data storehouse is carried out One time redundancy processes, and deletes one hour front all data record, reduces the scale of data base, accomplishes to upgrade in time and safeguard pass Connection information database.
7. extracting the 5th Partial Feature, feature mainly includes webpage relevance feature.
In social networks, the similar malicious code based on XSS leak spread speed in webpage is all very fast, and spread speed is Identify a validity feature of malicious web pages, accordingly, it would be desirable to extracted from the webpage source code that UTF-8 encodes by a kind of method The quantization characteristic of reaction spread speed.
The simple definition (distance that in the unit interval, object passes through) of similar scalar speed, is defined spread speed, i.e. unit In time, the number of times that character string occurs in webpage.Calculate suspicious character string step of spread speed in webpage as follows:
1)<the time t, string content C>record of webpage in statistic procedure 6;
2) string content in the spread speed of string content in each record, i.e. inquiry data base is added up identical, and the time At t with the number of previous hour interior all records, this statistical value is spread speed of this record;
3) maximum in all spread speeds is recorded, as the spread speed of webpage.
8. combining step 1, five Partial Feature obtained in 3,4,5,7, obtain the characteristic vector of webpage after merging.
Online social networks is unique compared with general networking application.The propagation in social networks of the XSS malicious code is different from Propagation in general networking, feature is the most intuitively, the high concentration class of social networks topology and less average beeline, leads Cause the XSS malicious code spread speed in social networks far above the spread speed in general networking.With an example actually occurred Evidence, the computer virus Blaster with 2003 infected 336,000 at 20 hours and compares, and social networks XSS anthelmintic Samy is 20 1,000,000 users have been infected in hour.Malicious code sense from such correction data it is found that in the mean unit time The number of users of dye, the number of online social networks is about 3 times of general networking, therefore, if can spread speed be indicated, just Can preferably distinguish the malicious web pages in network flow.
Experimental data:
Type Malicious Benign
Sample number 11,761 18,302
Precision 87.1% 96.1%
Recall 94.3% 91.1%
F-Measure 90.6% 93.5%
Testing result from upper table is it can be seen that the resolution utilizing the feature proposed in the present invention to detect webpage is average Can reach 90%, Detection results is good, it can be seen that " spread speed " that propose in the present invention is at online social networks Malicious web pages identification has important effect.

Claims (9)

1. an online social networks malicious web pages detection recognition methods, its step includes:
1) to the webpage of any one identification to be detected in online social networks, the frequency of occurrences of all keywords in this webpage is added up;Root According to source code in described webpage, webpage is divided into: html tag set or JavaScript set or set of URL in closing a kind of or The set of number of different types;
2) do not distinguish that the webpage static nature obscuring character obtains suspicious field, in conjunction with described suspicious word for empty set is extracted from above-mentioned The time that section occurs obtains the Relating Characteristic of webpage;
3) relatedness creating related information data base webpage in the Relating Characteristic storing this webpage real-time update data base is special Levy, extract according to described Relating Characteristic and obtain webpage spread speed;
4) according to described webpage spread speed, and combine statistics obtain keyword the frequency of occurrences, detection obtain suspicious In JavaScript script, suspicious html tag, suspicious URL, one or more feature detection identify malicious web pages.
2. online social networks malicious web pages detection recognition methods as claimed in claim 1, it is characterised in that source code from webpage Take out and meet the code segment of html tag and collect into html tag set, described html tag by label and/or end Label forms, the masurium that described beginning label is surrounded by bracket, brace that end-tag is surrounded by bracket and masurium.
3. online social networks malicious web pages detection recognition methods as claimed in claim 1, it is characterised in that source code in webpage The position that JavaScript script occurs in is:<script></script>between label or at " javascript: after ";According to described Script occurs that JavaScript script is taken out in position, collects into set.
4. online social networks malicious web pages detection recognition methods as claimed in claim 1, it is characterised in that source code from webpage Take out after lookup goes out URL with HTTP, HTTPS, one section of valid string separation and Extraction of the entitled beginning of File Transfer Protocol and obtain Set of URL closes.
5. online social networks malicious web pages detection recognition methods as claimed in claim 1 or 2, it is characterised in that to described HTML Tag set extracts and distinguishes that the webpage static nature method obscuring character is as follows:
In statistics html tag set, the information of all labels, extracts the greatest length of label, the number of long label in set, with And the ratio of contained JavaScript character string in label, the metering of degree is obscured as html tag.
6. the online social networks malicious web pages detection recognition methods as described in claim 1 or 3, it is characterised in that to described JavaScript set is extracted and is distinguished that the webpage static nature method obscuring character is as follows:
In statistics JavaScript set, the information of all scripts, extracts the greatest length of script character string, script character in set String is encoded character ratio and set in the number of times that character string connects occurs, obscure degree as JavaScript script Metering.
7. the online social networks malicious web pages detection recognition methods as described in claim 1 or 4, it is characterised in that to described URL Set is extracted and is distinguished that the webpage static nature method obscuring character is as follows:
The information of all URL in statistics set, extracts the greatest length of URL, the number of long URL in set, and URL The ratio of middle code character, obscures the metering of degree as URL.
8. online social networks malicious web pages detection recognition methods as claimed in claim 1, it is characterised in that described keyword is to have The frequency occurred in optimum script and malicious script also exists JavaScript function or the html tag of diversity, including: Eval, document.write, unescape, fromCharCode, createElement, createTextNode.
9. online social networks malicious web pages detection recognition methods as claimed in claim 1, it is characterised in that described spread speed is: In unit interval, suspected malicious code occurs in the frequency in webpage, and the step of spread speed is such as in webpage to calculate suspicious character string Under:
1)<time that suspicious field occurs, suspicious string content>record of statistical web page in related information data base;
2) identical by string content in inquiry data base, and the time is interior all with previous hour at the time t that suspicious field occurs The number of record, adds up the spread speed of string content in each record;
3) maximum in all spread speeds is recorded, as the spread speed of webpage.
CN201310507897.2A 2013-10-24 2013-10-24 A kind of online social networks malicious web pages detection recognition methods Expired - Fee Related CN103559235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310507897.2A CN103559235B (en) 2013-10-24 2013-10-24 A kind of online social networks malicious web pages detection recognition methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310507897.2A CN103559235B (en) 2013-10-24 2013-10-24 A kind of online social networks malicious web pages detection recognition methods

Publications (2)

Publication Number Publication Date
CN103559235A CN103559235A (en) 2014-02-05
CN103559235B true CN103559235B (en) 2016-08-17

Family

ID=50013482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310507897.2A Expired - Fee Related CN103559235B (en) 2013-10-24 2013-10-24 A kind of online social networks malicious web pages detection recognition methods

Country Status (1)

Country Link
CN (1) CN103559235B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105592017B (en) * 2014-10-30 2019-03-29 阿里巴巴集团控股有限公司 The defence method and system of cross-site scripting attack
EP3242240B1 (en) * 2015-02-04 2018-11-21 Nippon Telegraph and Telephone Corporation Malicious communication pattern extraction device, malicious communication pattern extraction system, malicious communication pattern extraction method and malicious communication pattern extraction program
CN105488091A (en) * 2015-06-19 2016-04-13 哈尔滨安天科技股份有限公司 Network data detection method and system based on keyword matching
CN105160256A (en) * 2015-08-10 2015-12-16 上海斐讯数据通信技术有限公司 Web page vulnerability detection method and system
CN106095446A (en) * 2016-06-14 2016-11-09 深圳市彬讯科技有限公司 Software source code on-line detecting system and detection method thereof
CN106570401B (en) * 2016-12-27 2019-07-26 哈尔滨安天科技股份有限公司 A kind of malicious code detecting method and system based on time change
CN107577783A (en) * 2017-09-15 2018-01-12 电子科技大学 The type of webpage automatic identifying method excavated based on Web architectural features
CN107707561B (en) * 2017-11-01 2020-05-19 北京知道创宇信息技术股份有限公司 Penetration testing method and device
CN108363925B (en) * 2018-03-16 2021-06-25 北京奇虎科技有限公司 Method and device for identifying webpage ore mining script
CN108399337B (en) * 2018-03-16 2021-07-30 北京奇虎科技有限公司 Method and device for identifying webpage ore mining script
CN108427883B (en) * 2018-03-16 2021-09-24 北京奇虎科技有限公司 Method and device for detecting webpage ore mining script
CN108694042B (en) * 2018-06-15 2021-08-31 福州大学 JavaScript code confusion resolution method in webpage
CN108985059B (en) * 2018-06-29 2021-09-24 北京奇虎科技有限公司 Webpage backdoor detection method, device, equipment and storage medium
CN108920955B (en) * 2018-06-29 2022-03-11 北京奇虎科技有限公司 Webpage backdoor detection method, device, equipment and storage medium
CN108920950B (en) * 2018-06-29 2022-03-08 北京奇虎科技有限公司 Webpage backdoor detection method, device, equipment and storage medium
CN109525553B (en) * 2018-10-12 2021-06-11 网络通信与安全紫金山实验室 Transmission protection method, intermediate device, server and system for URL (Uniform resource locator) request
CN111107048B (en) * 2018-10-29 2021-11-30 中移(苏州)软件技术有限公司 Phishing website detection method and device and storage medium
CN109474629A (en) * 2018-12-28 2019-03-15 深圳竹云科技有限公司 A kind of honey jar design and implementation methods of anti-web crawlers
CN113595967A (en) * 2020-04-30 2021-11-02 深信服科技股份有限公司 Data identification method, equipment, storage medium and device
CN111797904A (en) * 2020-06-12 2020-10-20 哈尔滨安天科技集团股份有限公司 Method and device for detecting tampering of webpage features
CN113971284B (en) * 2020-07-24 2024-03-05 中国电信股份有限公司 JavaScript-based malicious webpage detection method, equipment and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1122656A2 (en) * 2000-01-25 2001-08-08 TekInsight.Com. Inc. Universal resource locator and navigation method
CN101350822A (en) * 2008-09-08 2009-01-21 南开大学 Method for discovering and tracing Internet malevolence code
CN102663296A (en) * 2012-03-31 2012-09-12 杭州安恒信息技术有限公司 Intelligent detection method for Java script malicious code facing to the webpage

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1122656A2 (en) * 2000-01-25 2001-08-08 TekInsight.Com. Inc. Universal resource locator and navigation method
CN101350822A (en) * 2008-09-08 2009-01-21 南开大学 Method for discovering and tracing Internet malevolence code
CN102663296A (en) * 2012-03-31 2012-09-12 杭州安恒信息技术有限公司 Intelligent detection method for Java script malicious code facing to the webpage

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种抗混淆的恶意代码变种识别系统;王蕊等;《电子学报》;20111031;第2322-2330页 *
基于语义的恶意代码行为特征提取及检测方法;王蕊等;《软件学报》;20121231;第378-393页 *

Also Published As

Publication number Publication date
CN103559235A (en) 2014-02-05

Similar Documents

Publication Publication Date Title
CN103559235B (en) A kind of online social networks malicious web pages detection recognition methods
Rao et al. Detection of phishing websites using an efficient feature-based machine learning framework
Basnet et al. Rule-based phishing attack detection
Liu et al. A novel approach for detecting browser-based silent miner
CN107241352A (en) A kind of net security accident classificaiton and Forecasting Methodology and system
CN107437026B (en) Malicious webpage advertisement detection method based on advertisement network topology
CN108566399B (en) Phishing website identification method and system
CN106095979B (en) URL merging processing method and device
CN103279710B (en) Method and system for detecting malicious codes of Internet information system
Taylor et al. Detecting malicious exploit kits using tree-based similarity searches
CN110177114A (en) The recognition methods of network security threats index, unit and computer readable storage medium
CN103067387B (en) A kind of anti-phishing monitoring system and method
CN107341399A (en) Assess the method and device of code file security
CN112989348B (en) Attack detection method, model training method, device, server and storage medium
CN110572359A (en) Phishing webpage detection method based on machine learning
CN113098887A (en) Phishing website detection method based on website joint characteristics
Deshpande et al. Detection of phishing websites using Machine Learning
CN104202291A (en) Anti-phishing method based on multi-factor comprehensive assessment method
CN112464666B (en) Unknown network threat automatic discovery method based on hidden network data
CN108337269A (en) A kind of WebShell detection methods
Geng et al. Favicon-a clue to phishing sites detection
CN107818132A (en) A kind of webpage agent discovery method based on machine learning
Tanaka et al. Phishing site detection using similarity of website structure
Zhang et al. Cross-site scripting (XSS) detection integrating evidences in multiple stages
Singh et al. A survey on different phases of web usage mining for anomaly user behavior investigation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160817

Termination date: 20191024

CF01 Termination of patent right due to non-payment of annual fee