CN103559235A - Online social network malicious webpage detection and identification method - Google Patents

Online social network malicious webpage detection and identification method Download PDF

Info

Publication number
CN103559235A
CN103559235A CN201310507897.2A CN201310507897A CN103559235A CN 103559235 A CN103559235 A CN 103559235A CN 201310507897 A CN201310507897 A CN 201310507897A CN 103559235 A CN103559235 A CN 103559235A
Authority
CN
China
Prior art keywords
webpage
online social
url
script
social networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310507897.2A
Other languages
Chinese (zh)
Other versions
CN103559235B (en
Inventor
李沁蕾
王蕊
贾晓启
张道娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201310507897.2A priority Critical patent/CN103559235B/en
Publication of CN103559235A publication Critical patent/CN103559235A/en
Application granted granted Critical
Publication of CN103559235B publication Critical patent/CN103559235B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2119Authenticating web pages, e.g. with suspicious links

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Virology (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to an online social network malicious webpage detection and identification method. The online social network malicious webpage detection and identification method comprises the steps of: 1) calculating frequency of occurrence of all keywords in any webpage to be detected and identified in an online social network; dividing the webpage into one or more collections in different types of an HTML (Hypertext Markup Language) label collection or JavaScript collection or a URL (Uniform Resource Locator) collection based on a source code in the webpage; 2) extracting and identifying confusing natures from a collection which is not null so as to obtain correlation characteristics of the webpage; 3) establishing a correlation information database, updating correlation characteristics of the webpage in the database in real time, and extracting based on the correlation characteristics to obtain a webpage propagation velocity; 4) identifying a malicious webpage based on the webpage propagation velocity and in combination with the characteristics which is obtained through statistics. The online social network malicious webpage detection and identification method not only has very good universality and can describe characteristics of online social network malicious webpage exactly, but also achieves more precise detection and identification, higher efficiency and lower analysis cost for the malicious webpage.

Description

A kind of online social networks malicious web pages detects recognition methods
Technical field
The invention belongs to network security technology field, relate to the recognition methods of a kind of online social networks malicious web pages, particularly the online social networks malicious web pages recognition methods based on malicious web pages feature extraction.
Background technology
Along with online social networks (Online Social Network, OSN) flourish, each large online social network-i i-platform has had huge customer volume, adds its hiding user's private information and potential economic interests, has become the focus of more and more network hackers.In the attack for online social networks, cross-site scripting attack (Cross-site Scripting, XSS) be a kind of common one of attack pattern of destructive power that has, the network worm that utilizes cross site scripting leak to produce, can infect at short notice a large amount of network users, even have influence on the normal operation of server.Therefore, extracting effective web page characteristics is current problem demanding prompt solution to improve to the identification of online social networks malicious web pages.
The analysis of existing online social networks malicious web pages adopts complicated Static Analysis Method mostly.Conventionally, the elements such as HTML, CSS, URI, JavaScript in the source code of webpage, have been comprised, in webpage, HTML, CSS, URI, the JavaScript of malice may cause webpage when browser end loads, to produce the behavior of malice, such as stealing cookie, opening fishing website etc.In online social networks, user can input freely the content of certain length from the text box of webpage, comprise the codes such as HTML, CSS, URI, JavaScript, for fear of the malicious code that may comprise in user input content, when the content in input frame is submitted to, need to carry out static analysis to it, can, respectively from the angle of HTML, CSS, URI, JavaScript, utilize formal methods analyst to judge that these element structures and content possibility produce malicious act.
In malicious web pages, the malicious code based on XSS leak is modal a kind of webpage malicious code, has had the analysis means of many maturations for such malicious code.In the web page analysis process of non-online social networks (as: portal website, forum website etc.), from the angle incision of obfuscated codes, extract the feature of obfuscated codes in webpage, judge whether webpage exists suspicious malicious code.The feature of extracting mainly comprises: key word, JavaScript feature (comprising length, character number etc.), URL feature etc.
In the recognition methods of existing a series of online social networks malicious web pages analyzing and testing, Static Analysis Method needs complicated analytical procedure mostly, processing time is long, ageing not high, compare with dynamic analysing method, the low time loss that Static Analysis Method should have is not withdrawn deposit completely, and the web-page requests that complicated analysis and calculation process causes postpones also can bring negative effect to network application.Therefore, for online social networks malicious web pages, proposing a kind of simple and effective feature extracting method, lower analysis cost, is to need at present the problem of researching and solving badly.
Summary of the invention
For online social networks malicious web pages, detect the problem of identification, the object of the invention is to propose a kind of online social networks malicious web pages based on the feature extraction of online social networks malicious web pages and detect recognition methods.After the webpage of online social networks is analyzed, from following malicious web pages feature, analyze: the angle extraction of key word, JavaScript, HTML, URL and online social networks self-characteristic has the feature that quantizes character, utilize those malicious web pages features of extracting to identify the malicious web pages of the malicious code with XSS leak in online social networks.
Technical scheme of the present invention is as follows: a kind of online social networks malicious web pages detects recognition methods, and its step comprises:
1) webpage to any one identification to be detected in online social networks, the frequency of occurrences of adding up all key words in this webpage; According to source code in described webpage, webpage is divided into: one or more dissimilar set in html tag set or JavaScript set or URL set;
2) from above-mentioned, for extraction empty set, do not distinguish that the webpage static nature of obscuring character obtains suspicious field, the time occurring in conjunction with described suspicious field obtains the Relating Characteristic of webpage;
3) create related information database for storing the Relating Characteristic of the Relating Characteristic of this webpage the webpage of real-time update database, according to described Relating Characteristic, extract and obtain webpage velocity of propagation;
4) according to described page velocity of propagation, and the frequency of occurrences of the key word obtaining in conjunction with statistics, detect one or more feature detection in the suspicious JavaScript script that obtains, suspicious html tag, suspicious URL and identify malicious web pages.
Further, from webpage, source code takes out the code segment meet html tag and is gathered into html tag set, described html tag is by starting label and/or end-tag forms, the masurium that described beginning label is surrounded by bracket, the brace that end-tag is surrounded by bracket and masurium.
Further, the position that in webpage, the JavaScript script of source code appears at is: between <script></script > label or after " javascript: "; According to described script, there is position taking-up JavaScript script, be gathered into set.
Further, from webpage, source code takes out to search after the one section of valid string separation and Extraction that is called beginning with HTTP, HTTPS, File Transfer Protocol name goes out URL and obtains URL set.
Further, described html tag set is extracted and is distinguished that to obscure the webpage static nature method of character as follows:
The information of all labels in statistics html tag set, extracts the maximum length of label in set, the number of long label, and the ratio of contained JavaScript character string in label, obscures the metering of degree as html tag.
Further, to described JavaScript, set is extracted and is distinguished that to obscure the webpage static nature method of character as follows:
The information of all scripts in statistics JavaScript set, extract in the ratio of the character that is encoded in the maximum length, script character string of script character string in set and set and occur the number of times that character string connects, as JavaScript script, obscure the metering of degree.
Further, to described URL, set is extracted and is distinguished that to obscure the webpage static nature method of character as follows:
The information of all URL in statistics set, extracts the maximum length of URL in set, the number of long URL, and the ratio of coded character in URL, obscures the metering of degree as URL.
Further, described key word is to have the frequency occurring in optimum script and malicious script to exist JavaScript function or the html tag of otherness, comprising: eval, document.write, unescape, fromCharCode, createElement, createTextNode.
Further, described velocity of propagation is: in the unit interval, suspicious malicious code appears at the frequency in webpage, calculates suspicious character string step of velocity of propagation in webpage as follows:
1) time that the suspicious field of the < of statistical web page occurs in related information database, suspicious character string content > record;
2) identical by character string content in Query Database, and the number of time all records in last hour at t, add up the velocity of propagation of character string content in each record;
3) record the maximal value in all velocity of propagation, the velocity of propagation using it as webpage.
Beneficial effect of the present invention:
1. the present invention extracts one group of online social networks malicious web pages feature, has good universality.
2. the present invention is based on structure of web page feature, webpage is carried out to pre-service, from web page element type angle, extract web page characteristics, set up the related information database between webpage simultaneously.
3. the present invention has fully taken into account the propagating characteristic of malicious code in online social networks, based on propagating characteristic, extracts one group of feature pointed to online social networks.
To sum up, the online social networks malicious web pages that the present invention proposes detects recognition methods, can describe more accurately the feature of online social networks malicious web pages, to the detection identification of malicious web pages more accurately, efficiency is higher, analysis cost is lower.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet that online social networks malicious web pages detects recognition methods.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, be understandable that, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those skilled in the art, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
Realize a kind of embodiment of the present invention as follows, online social networks malicious web pages detects recognition methods, the steps include:
1) webpage to any one identification to be detected in online social networks, the frequency of occurrences of adding up all given key words in this webpage;
2) analyzing web page structure is divided into dissimilar set according to webpage source code by webpage, and webpage source code is resolved into HTML set, JavaScript set, URL set;
3) from HTML set, JavaScript set, URL set, extract and distinguish that the webpage static nature of obscuring character is suspicious field, the time occurring in conjunction with suspicious field obtains the Relating Characteristic of webpage;
4) store the Relating Characteristic of this webpage, upgrade related information database, to upgrade up-to-date Relating Characteristic; According to the database information that records online social networks webpage relevance feature, extract the Relating Characteristic of webpage;
5) according to relationship information, extract and to obtain webpage velocity of propagation, and in conjunction with given key word, suspicious JavaScript, suspicious HTML, suspicious URL, totally five features, obtain the proper vector of webpage;
6) according to proper vector, detect and identify malicious web pages.
In one embodiment of this invention, key word has referred to some JavaScript function or html tags, and the frequency that they occur in optimum script and malicious script exists otherness.Some key words like this, their occurrence number frequencies less and that occur in malicious web pages in optimum webpage are higher, we think that these fields can become the key word in webpage, thereby can utilize the frequency that in webpage, key word occurs to judge whether webpage is maliciously.
In one embodiment of this invention, obtain the component of webpage according to the structure of analyzing web page, we carry out pre-service to webpage, and the target of processing is that webpage is divided into dissimilar set.The feature of extracting due to us all comes from html tag, JavaScript script, URL, therefore, when webpage is carried out to pre-service, the source code of webpage has been divided into html tag set, the set of JavaScript script and URL set, in following step, we only need respectively these three set to be extracted to relevant information, the processing time of having avoided all needing to process a large amount of data at every turn and having caused is long, in addition, sorted set is analyzed and extracted feature, also can make feature more accurate.
In one embodiment of this invention, extract webpage static nature, according to being to have the feature of obscuring character in webpage, to three groups of set, its extracting method is:
(1) to html tag set, the information of all labels in statistics set, extract the maximum length of label in set, the number of long label, and the ratio of contained JavaScript character string in label, the statistical value of these quantifications can be used as the metering that html tag is obscured degree.
(2) to the set of JavaScript script, the information of all scripts in statistics set, extract the ratio of the character that is encoded in the maximum length, script character string of script character string in set, and in set, there is the number of times that character string connects, the statistical value of these quantifications can be used as the metering that JavaScript script is obscured degree.
(3) to URL set, the information of all URL in statistics set, extracts the maximum length of URL in set, the number of long URL, and the ratio of coded character in URL, and the statistical value of these quantifications can be used as the metering that URL obscures degree.
In one embodiment of this invention, according to the feature of online social networks, the propagation of malicious code in social networks is different from the propagation in general networking, feature is the most intuitively, the high concentration class of social networks topology and less average bee-line, cause malicious code velocity of propagation in social networks far above the velocity of propagation in general networking.In order to quantize the value of velocity of propagation, defining in the present invention velocity of propagation is: in the unit interval, suspicious malicious code appears at the frequency in webpage, and the statistics of speed need to depend on the number of times that in the webpage that server end sent in nearest hour, suspicious malicious code occurs.In order to extract feature, need to know its detected all webpages in the unit interval in the past, therefore create the online social networks webpage relevance feature in a related information database real-time update database, from database, can add up and extract the feature needing.
In one embodiment of this invention, related information database needs constantly to upgrade, and database need to be preserved the related information of all webpages, therefore, after obtaining the Relating Characteristic of webpage, the relationship information of webpage is saved in database, upgrade related information database.In order to improve, to upgrade efficiency and only need with reference to the related information in nearest a hour, all information before one hour are actually not as a reference, in order to improve the access efficiency of database, within every ten minutes, safeguard a data base entries, by all information deletions before a hour.
Be the schematic flow sheet that line social networks malicious web pages detects recognition methods as shown in Figure 1, comprise step:
1. extract part 1 web page characteristics, feature mainly comprises key characteristics.
When malicious web pages loads in client browser, can carry out some attacks, these behaviors are carried out and are realized by a series of combination of function.When static analysis front-page keyword, while utilizing the number of times of the appearance of key word to replace performance analysis, the execution sequence of key word is as the feature of key word.From statistics, find, some script function, it may appear in all webpages, but the frequency that they are used but differs widely.Key word can include but not limited to: eval, document.write, unescape, fromCharCode, createElement, createTextNode etc., this area clearly understands how for malicious web pages leak, to carry out the extraction of key word, so the type of above-mentioned key word does not limit key word.As a character string, carry out function eval, it can carry out a code existing with character string forms, and eval is a legal function, and it is present in various webpages, but the frequency that it generally occurs in webpage is lower.Yet in malicious web pages, therefore the number of times that eval occurs can extract the feature of of this sort key word higher than the number of times generally occurring, can be used as a kind of sign of identifying malicious web pages.
2. webpage pre-service, classifies webpage source code according to element type.
In webpage source code, there is multiple element, the most basic html tag, JavaScript script, the URL etc. of having comprised.One of point of penetration of the present invention is from html tag, JavaScript script, URL, to search the clues and traces that malicious code exists, in order to facilitate the realization of method, before extracting other several Partial Feature, need to carry out a webpage pre-service, after processing, obtain the set of three kinds of elements.
Preprocessing process is as follows:
1) html tag is one group and has cannonical format, label is by starting label and end-tag forms, start the masurium that label is surrounded by bracket, the brace that end-tag is surrounded by bracket and masurium, some may not have end-tag, as <br/>.From webpage source code, take out the code segment that meets html tag, be gathered into set.
2), in webpage, JavaScript script appears between <script></script > label conventionally, or after " javascript: ".The position occurring according to script, analyzing web page source code, from wherein taking out JavaScript script, is gathered into set.
3) URL is the addresses of all resources on Internet, and they are followed in unified standard webpage may exist some from the resource in this territory or other territories.The initial of URL form is a protocol name, and in Internet, conventional agreement is limited, comprised HTTP, HTTPS, FTP etc., when collecting URL set, only need to search that to take protocol name be that one section of valid string of beginning can separation and Extraction go out URL from webpage source code.
3. extract part 2 feature, feature mainly comprises html tag feature.
Html tag has formed the structure of web webpage, label can add and delete by script is dynamic, attribute in label can utilize script to revise dynamically (as: value) in addition, some can also automatically perform (as: src), therefore, html tag becomes the good place that malicious script is concealed.General html tag limited length, if concealed malicious script in html tag, html tag length may be greater than the length of optimum netpage tag so.
4. extract the 3rd Partial Feature, feature mainly comprises JavaScript script feature.
XSS malicious code is generally used JavaScript script edit, and except the aggressiveness of code, in a lot of situations, malicious code fabricator, in order to confuse victim, can use some to obscure means to script, reduces the readability of program, avoids victim to discover.A kind of general means of obscuring are that malicious code is encoded.Through the shell script of coding, length obviously increases, and in character string, the ratio of coded character also will increase.
5. extract the 4th Partial Feature, feature mainly comprises URL feature.
While there is reflection-type XSS in webpage, in webpage source code, can comprise the malice URL that causes XSS, these URL are with malicious script.For confuse user click malice URL, what malicious code fabricator can have a mind to processes distortion to URL, makes user cannot distinguish the content of URL argument section, user cheating is the URL in webpage clicking under unguard.
6. store webpage relevance information, upgrade related information database.
Set up a related information database, the relationship information of in store webpage in database.So-called relationship information, refers to some suspicious fields in webpage (as suspicious JavaScript script character string, suspicious URL, suspicious html tag etc.), and the time of suspicious field appearance.Due to the speed of needs statistics propagation, and speed is directly related with the time, so the time is a significant field in database.
When a webpage is extracted feature, after having carried out the processing of the first five step, can obtain some the suspicious character strings in one group of this webpage, velocity of propagation for the webpage that computational grid occurs in flowing afterwards easily, need to preserve in this webpage suspicious character string to the relationship information of webpage afterwards, the time that the suspicious field of the < of webpage is occurred, one group of such record of suspicious character string content > is inserted in database, in addition, in order to improve the efficiency of database work, the content of database being carried out to a redundancy in every ten minutes processes, delete all data recording before a hour, reduce the scale of database, accomplish to upgrade in time and maintenance association information database.
7. extract the 5th Partial Feature, feature mainly comprises webpage relevance feature.
In social networks, the velocity of propagation of the similar malicious code based on XSS leak in webpage is all very fast, velocity of propagation is a validity feature of identification malicious web pages, therefore, need to from the webpage source code of UTF-8 coding, extract by a kind of method the quantization characteristic of reaction velocity of propagation.
The simple defining of similar scalar speed (distance that in the unit interval, object passes through), defines velocity of propagation, i.e. in the unit interval, and the number of times that character string occurs in webpage.Calculate suspicious character string step of velocity of propagation in webpage as follows:
1) the < time t of webpage in statistic procedure 6, character string content C> record;
2) add up the velocity of propagation of character string content in each record, in Query Database, character string content is identical, and the number of time all records in last hour at t, the velocity of propagation that this statistical value is this record;
3) record the maximal value in all velocity of propagation, the velocity of propagation using it as webpage.
8. combining step 1,3, and five Partial Feature that obtain in 4,5,7 obtain the proper vector of webpage after merging.
Online social networks is compared and is had uniqueness with general networking application.The propagation of XSS malicious code in social networks is different from the propagation in general networking, feature is the most intuitively, the high concentration class of social networks topology and less average bee-line, cause XSS malicious code velocity of propagation in social networks far above the velocity of propagation in general networking.With an actual example evidence occurring, compare infection 336,000 in 20 hours with the computer virus Blaster of 2003, social networks XSS worm Samy has infected 1,000,000 user in 20 hours.From such correlation data, can find, the number of users that in the average unit interval, malicious code infects, the number of online social networks is about 3 times of general networking, therefore, if velocity of propagation can be indicated, just can better distinguish the malicious web pages in network flow.
Experimental data:
Type Malicious Benign
Sample number 11,761 18,302
Precision 87.1% 96.1%
Recall 94.3% 91.1%
F-Measure 90.6% 93.5%
Testing result from upper table can be found out, utilize the resolution that the feature that proposes in the present invention detects webpage on average can reach 90%, detect respond well, can find out, " velocity of propagation " proposing in the present invention has important effect in the malicious web pages identification of online social networks.

Claims (9)

1. online social networks malicious web pages detects a recognition methods, and its step comprises:
1) webpage to any one identification to be detected in online social networks, the frequency of occurrences of adding up all key words in this webpage; According to source code in described webpage, webpage is divided into: one or more dissimilar set in html tag set or JavaScript set or URL set;
2) from above-mentioned, for extraction empty set, do not distinguish that the webpage static nature of obscuring character obtains suspicious field, the time occurring in conjunction with described suspicious field obtains the Relating Characteristic of webpage;
3) create related information database for storing the Relating Characteristic of the Relating Characteristic of this webpage the webpage of real-time update database, according to described Relating Characteristic, extract and obtain webpage velocity of propagation;
4) according to described page velocity of propagation, and the frequency of occurrences of the key word obtaining in conjunction with statistics, detect one or more feature detection in the suspicious JavaScript script that obtains, suspicious html tag, suspicious URL and identify malicious web pages.
2. online social networks malicious web pages as claimed in claim 1 detects recognition methods, it is characterized in that, from webpage, source code takes out the code segment meet html tag and is gathered into html tag set, described html tag is by starting label and/or end-tag forms, the masurium that described beginning label is surrounded by bracket, the brace that end-tag is surrounded by bracket and masurium.
3. online social networks malicious web pages as claimed in claim 1 detects recognition methods, it is characterized in that, the position that in webpage, the JavaScript script of source code appears at is: between <script></script > label or after " javascript: "; According to described script, there is position taking-up JavaScript script, be gathered into set.
4. online social networks malicious web pages as claimed in claim 1 detects recognition methods, it is characterized in that, from webpage, source code takes out to search after the one section of valid string separation and Extraction that is called beginning with HTTP, HTTPS, File Transfer Protocol name goes out URL and obtains URL set.
5. online social networks malicious web pages as claimed in claim 1 or 2 detects recognition methods, it is characterized in that, described html tag set is extracted and distinguished that to obscure the webpage static nature method of character as follows:
The information of all labels in statistics html tag set, extracts the maximum length of label in set, the number of long label, and the ratio of contained JavaScript character string in label, obscures the metering of degree as html tag.
6. the online social networks malicious web pages as described in claim 1 or 3 detects recognition methods, it is characterized in that, described JavaScript set is extracted and distinguished that to obscure the webpage static nature method of character as follows:
The information of all scripts in statistics JavaScript set, extract in the ratio of the character that is encoded in the maximum length, script character string of script character string in set and set and occur the number of times that character string connects, as JavaScript script, obscure the metering of degree.
7. the online social networks malicious web pages as described in claim 1 or 4 detects recognition methods, it is characterized in that, described URL set is extracted and distinguished that to obscure the webpage static nature method of character as follows:
The information of all URL in statistics set, extracts the maximum length of URL in set, the number of long URL, and the ratio of coded character in URL, obscures the metering of degree as URL.
8. online social networks malicious web pages as claimed in claim 1 detects recognition methods, it is characterized in that, described key word is to have the frequency occurring in optimum script and malicious script to exist JavaScript function or the html tag of otherness, comprise: eval, document.write, unescape, fromCharCode, createElement, createTextNode.
9. online social networks malicious web pages as claimed in claim 1 detects recognition methods, it is characterized in that, described velocity of propagation is: in the unit interval, suspicious malicious code appears at the frequency in webpage, calculates suspicious character string step of velocity of propagation in webpage as follows:
1) time that the suspicious field of the < of statistical web page occurs in related information database, suspicious character string content > record;
2) identical by character string content in Query Database, and the number of time all records in last hour at t, add up the velocity of propagation of character string content in each record;
3) record the maximal value in all velocity of propagation, the velocity of propagation using it as webpage.
CN201310507897.2A 2013-10-24 2013-10-24 A kind of online social networks malicious web pages detection recognition methods Expired - Fee Related CN103559235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310507897.2A CN103559235B (en) 2013-10-24 2013-10-24 A kind of online social networks malicious web pages detection recognition methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310507897.2A CN103559235B (en) 2013-10-24 2013-10-24 A kind of online social networks malicious web pages detection recognition methods

Publications (2)

Publication Number Publication Date
CN103559235A true CN103559235A (en) 2014-02-05
CN103559235B CN103559235B (en) 2016-08-17

Family

ID=50013482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310507897.2A Expired - Fee Related CN103559235B (en) 2013-10-24 2013-10-24 A kind of online social networks malicious web pages detection recognition methods

Country Status (1)

Country Link
CN (1) CN103559235B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105160256A (en) * 2015-08-10 2015-12-16 上海斐讯数据通信技术有限公司 Web page vulnerability detection method and system
CN105488091A (en) * 2015-06-19 2016-04-13 哈尔滨安天科技股份有限公司 Network data detection method and system based on keyword matching
CN105592017A (en) * 2014-10-30 2016-05-18 阿里巴巴集团控股有限公司 Method and system for defending cross-station script attack
CN106095446A (en) * 2016-06-14 2016-11-09 深圳市彬讯科技有限公司 Software source code on-line detecting system and detection method thereof
CN106570401A (en) * 2016-12-27 2017-04-19 哈尔滨安天科技股份有限公司 Method and system for detecting malicious code based on time variation
CN107209834A (en) * 2015-02-04 2017-09-26 日本电信电话株式会社 Malicious communication pattern extraction apparatus, malicious communication schema extraction system, malicious communication schema extraction method and malicious communication schema extraction program
CN107577783A (en) * 2017-09-15 2018-01-12 电子科技大学 The type of webpage automatic identifying method excavated based on Web architectural features
CN107707561A (en) * 2017-11-01 2018-02-16 北京知道创宇信息技术有限公司 penetration testing method and device
CN108363925A (en) * 2018-03-16 2018-08-03 北京奇虎科技有限公司 Webpage digs recognition methods and the device of mine script
CN108399337A (en) * 2018-03-16 2018-08-14 北京奇虎科技有限公司 Webpage digs the method and device of mine script for identification
CN108427883A (en) * 2018-03-16 2018-08-21 北京奇虎科技有限公司 Webpage digs the detection method and device of mine script
CN108694042A (en) * 2018-06-15 2018-10-23 福州大学 JavaScript code solution in webpage obscures method
CN108920955A (en) * 2018-06-29 2018-11-30 北京奇虎科技有限公司 A kind of webpage back door detection method, device, equipment and storage medium
CN108920950A (en) * 2018-06-29 2018-11-30 北京奇虎科技有限公司 A kind of webpage back door detection method, device, equipment and storage medium
CN108985059A (en) * 2018-06-29 2018-12-11 北京奇虎科技有限公司 A kind of webpage back door detection method, device, equipment and storage medium
CN109474629A (en) * 2018-12-28 2019-03-15 深圳竹云科技有限公司 A kind of honey jar design and implementation methods of anti-web crawlers
CN109525553A (en) * 2018-10-12 2019-03-26 上海拟态数据技术有限公司 A kind of transmission protecting of URL request, intermediate equipment, server and system
CN111107048A (en) * 2018-10-29 2020-05-05 中移(苏州)软件技术有限公司 Phishing website detection method and device and storage medium
CN111797904A (en) * 2020-06-12 2020-10-20 哈尔滨安天科技集团股份有限公司 Method and device for detecting tampering of webpage features
CN113595967A (en) * 2020-04-30 2021-11-02 深信服科技股份有限公司 Data identification method, equipment, storage medium and device
CN113971284A (en) * 2020-07-24 2022-01-25 中国电信股份有限公司 JavaScript-based malicious webpage detection method and device and computer-readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1122656A2 (en) * 2000-01-25 2001-08-08 TekInsight.Com. Inc. Universal resource locator and navigation method
CN101350822A (en) * 2008-09-08 2009-01-21 南开大学 Method for discovering and tracing Internet malevolence code
CN102663296A (en) * 2012-03-31 2012-09-12 杭州安恒信息技术有限公司 Intelligent detection method for Java script malicious code facing to the webpage

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1122656A2 (en) * 2000-01-25 2001-08-08 TekInsight.Com. Inc. Universal resource locator and navigation method
CN101350822A (en) * 2008-09-08 2009-01-21 南开大学 Method for discovering and tracing Internet malevolence code
CN102663296A (en) * 2012-03-31 2012-09-12 杭州安恒信息技术有限公司 Intelligent detection method for Java script malicious code facing to the webpage

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王蕊等: "一种抗混淆的恶意代码变种识别系统", 《电子学报》, 31 October 2011 (2011-10-31) *
王蕊等: "基于语义的恶意代码行为特征提取及检测方法", 《软件学报》, 31 December 2012 (2012-12-31) *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105592017A (en) * 2014-10-30 2016-05-18 阿里巴巴集团控股有限公司 Method and system for defending cross-station script attack
CN105592017B (en) * 2014-10-30 2019-03-29 阿里巴巴集团控股有限公司 The defence method and system of cross-site scripting attack
CN107209834A (en) * 2015-02-04 2017-09-26 日本电信电话株式会社 Malicious communication pattern extraction apparatus, malicious communication schema extraction system, malicious communication schema extraction method and malicious communication schema extraction program
CN105488091A (en) * 2015-06-19 2016-04-13 哈尔滨安天科技股份有限公司 Network data detection method and system based on keyword matching
CN105160256A (en) * 2015-08-10 2015-12-16 上海斐讯数据通信技术有限公司 Web page vulnerability detection method and system
CN106095446A (en) * 2016-06-14 2016-11-09 深圳市彬讯科技有限公司 Software source code on-line detecting system and detection method thereof
CN106570401A (en) * 2016-12-27 2017-04-19 哈尔滨安天科技股份有限公司 Method and system for detecting malicious code based on time variation
CN106570401B (en) * 2016-12-27 2019-07-26 哈尔滨安天科技股份有限公司 A kind of malicious code detecting method and system based on time change
CN107577783A (en) * 2017-09-15 2018-01-12 电子科技大学 The type of webpage automatic identifying method excavated based on Web architectural features
CN107707561A (en) * 2017-11-01 2018-02-16 北京知道创宇信息技术有限公司 penetration testing method and device
CN107707561B (en) * 2017-11-01 2020-05-19 北京知道创宇信息技术股份有限公司 Penetration testing method and device
CN108363925A (en) * 2018-03-16 2018-08-03 北京奇虎科技有限公司 Webpage digs recognition methods and the device of mine script
CN108427883B (en) * 2018-03-16 2021-09-24 北京奇虎科技有限公司 Method and device for detecting webpage ore mining script
CN108363925B (en) * 2018-03-16 2021-06-25 北京奇虎科技有限公司 Method and device for identifying webpage ore mining script
CN108427883A (en) * 2018-03-16 2018-08-21 北京奇虎科技有限公司 Webpage digs the detection method and device of mine script
CN108399337A (en) * 2018-03-16 2018-08-14 北京奇虎科技有限公司 Webpage digs the method and device of mine script for identification
CN108694042B (en) * 2018-06-15 2021-08-31 福州大学 JavaScript code confusion resolution method in webpage
CN108694042A (en) * 2018-06-15 2018-10-23 福州大学 JavaScript code solution in webpage obscures method
CN108985059A (en) * 2018-06-29 2018-12-11 北京奇虎科技有限公司 A kind of webpage back door detection method, device, equipment and storage medium
CN108920950A (en) * 2018-06-29 2018-11-30 北京奇虎科技有限公司 A kind of webpage back door detection method, device, equipment and storage medium
CN108920955A (en) * 2018-06-29 2018-11-30 北京奇虎科技有限公司 A kind of webpage back door detection method, device, equipment and storage medium
CN109525553A (en) * 2018-10-12 2019-03-26 上海拟态数据技术有限公司 A kind of transmission protecting of URL request, intermediate equipment, server and system
CN109525553B (en) * 2018-10-12 2021-06-11 网络通信与安全紫金山实验室 Transmission protection method, intermediate device, server and system for URL (Uniform resource locator) request
CN111107048A (en) * 2018-10-29 2020-05-05 中移(苏州)软件技术有限公司 Phishing website detection method and device and storage medium
CN111107048B (en) * 2018-10-29 2021-11-30 中移(苏州)软件技术有限公司 Phishing website detection method and device and storage medium
CN109474629A (en) * 2018-12-28 2019-03-15 深圳竹云科技有限公司 A kind of honey jar design and implementation methods of anti-web crawlers
CN113595967A (en) * 2020-04-30 2021-11-02 深信服科技股份有限公司 Data identification method, equipment, storage medium and device
CN111797904A (en) * 2020-06-12 2020-10-20 哈尔滨安天科技集团股份有限公司 Method and device for detecting tampering of webpage features
CN113971284A (en) * 2020-07-24 2022-01-25 中国电信股份有限公司 JavaScript-based malicious webpage detection method and device and computer-readable storage medium
CN113971284B (en) * 2020-07-24 2024-03-05 中国电信股份有限公司 JavaScript-based malicious webpage detection method, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN103559235B (en) 2016-08-17

Similar Documents

Publication Publication Date Title
CN103559235B (en) A kind of online social networks malicious web pages detection recognition methods
CN110808968B (en) Network attack detection method and device, electronic equipment and readable storage medium
CN104125209B (en) Malice website prompt method and router
US9218482B2 (en) Method and device for detecting phishing web page
Chiew et al. Leverage website favicon to detect phishing websites
CN107204960B (en) Webpage identification method and device and server
CN111585955B (en) HTTP request abnormity detection method and system
CN107437026B (en) Malicious webpage advertisement detection method based on advertisement network topology
CN103279710B (en) Method and system for detecting malicious codes of Internet information system
Taylor et al. Detecting malicious exploit kits using tree-based similarity searches
CN101895516A (en) Method and device for positioning cross-site scripting attack source
US20200153865A1 (en) Sensor based rules for responding to malicious activity
CN112989348B (en) Attack detection method, model training method, device, server and storage medium
CN110572359A (en) Phishing webpage detection method based on machine learning
CN113779481B (en) Method, device, equipment and storage medium for identifying fraud websites
CN111460803B (en) Equipment identification method based on Web management page of industrial Internet of things equipment
Shibahara et al. Detecting malicious websites by integrating malicious, benign, and compromised redirection subgraph similarities
Tanaka et al. Phishing site detection using similarity of website structure
CN108270754B (en) Detection method and device for phishing website
Singh et al. A survey on different phases of web usage mining for anomaly user behavior investigation
KR101430175B1 (en) System and method for searching leakage of individual information
Shahriar et al. Design and development of Anti-XSS proxy
KR20120090131A (en) Method, system and computer readable recording medium for providing search results
Cui et al. The generation of XSS attacks developing in the detect detection
Guichang et al. CNNPayl: An Intrusion Detection System of Cross-site Script Detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160817

Termination date: 20191024