CN102663000B - The maliciously recognition methods of the method for building up of network address database, maliciously network address and device - Google Patents

The maliciously recognition methods of the method for building up of network address database, maliciously network address and device Download PDF

Info

Publication number
CN102663000B
CN102663000B CN201210069443.7A CN201210069443A CN102663000B CN 102663000 B CN102663000 B CN 102663000B CN 201210069443 A CN201210069443 A CN 201210069443A CN 102663000 B CN102663000 B CN 102663000B
Authority
CN
China
Prior art keywords
url
website
network address
detected
chain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210069443.7A
Other languages
Chinese (zh)
Other versions
CN102663000A (en
Inventor
梁知音
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210069443.7A priority Critical patent/CN102663000B/en
Publication of CN102663000A publication Critical patent/CN102663000A/en
Application granted granted Critical
Publication of CN102663000B publication Critical patent/CN102663000B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides recognition methods and the device of the method for building up of a kind of malice network address database, maliciously network address, this method for building up includes: S1, structure the website information association data base;S2, structure anti-chain linked database;S3, obtain known malicious network address, add in queue to be detected, repeated execution of steps S4, until described queue to be detected be empty, utilize and all occur in data construct maliciously network address database in queue to be detected;S4, inquiry anti-chain linked database, determine that all anti-chain url of current url, the anti-chain url that weights exceed predetermined threshold value add in queue to be detected;Or, resolving the website attribute information of current url, inquiry station dot information linked database, determine, with current url, there is the website domain name of same site attribute information, the website domain name that weights exceed predetermined threshold value is added in queue to be detected.Compared to existing technology, the present invention improves promptness and the accuracy of detection, and minimizing is failed to report.

Description

The maliciously recognition methods of the method for building up of network address database, maliciously network address and device
[technical field]
The present invention relates to computer security technique field, particularly to recognition methods and the device of the method for building up of a kind of malice network address database, maliciously network address.
[background technology]
Along with the development of computer techno-stress technology, the Internet is more and more important to people, has been deep into the work of people and the various aspects of life.But the thing followed, the malicious act for the Internet also gets more and more, and various safety problems have greatly perplexed the network user.Numerous for the Websites quantity of the malicious acts such as swindle on the Internet at present, these websites illegally got a profit, because of the disguise of its profit channel, threaten user security.But these illegal websites are of short duration for life cycle; usual one it is found that; usually can be banned or be cancelled; in order to ensure effect; illegal website operator generally holds substantial amounts of similar station group and replaces at any time; there is close association between the group of these stations, gradually refine and define a huge Dark Industry Link, be commonly referred to as " underground, the Internet industrial chain ".
Existing malice network address detection means has: static nature detection and simulation browser detect.Static Detection is to utilize the malicious code feature collected in advance, by checking webpage HTML (HypertextMarkupLanguage, HTML) whether code comprises those condition codes judge, if comprised, then it is judged as malice network address.The discrimination of this detection method is the most relatively low, is easily got around by various script encryptions and coded system.Simulation browser detection is to utilize the browser environment built in advance, and analog subscriber accesses network address, during if there is illegal behavior characteristics, is then identified as malice network address.The detection efficiency of this mode is relatively low, and after running into malice network address, browser environment it may also be desirable to again recover, and the completely real browser environment of more difficult structure, it is easily caused and fails to report.The URL library replaced at any time for illegal website operator, just can judge after needing to perform one by one, it is impossible to find malice network address in advance, ageing poor.
[summary of the invention]
In view of this, the invention provides recognition methods and the device of the method for building up of a kind of malice network address database, maliciously network address, in order to improving promptness and the accuracy of detection, minimizing is failed to report.
Concrete technical scheme is as follows:
A kind of method for building up of malice network address database, the method comprises the following steps:
S1, in advance each website domain name is associated with corresponding website attribute information, structure the website information association data base;
S2, in advance structure anti-chain linked database, preserve the linking relationship between each url;
S3, the url of acquisition known malicious network address, add in queue to be detected, from queue to be detected, take out url one by one and the current url taken out is performed step S4 respectively, until queue to be detected is empty, and all url added in queue to be detected or website domain name is utilized to build malice network address database;
S4, inquire about described anti-chain linked database, determine all anti-chain url of current url, the anti-chain url that the correlation degree between the url of known malicious network address meets preset requirement is added in queue to be detected;Or
Resolve the website attribute information of current url, inquire about described site information linked database, determine, with current url, there is the website domain name of same site attribute information, the website domain name that the correlation degree between the url of known malicious network address meets preset requirement is added in queue to be detected.
According to one preferred embodiment of the present invention, described website attribute information includes set forth below at least one: website name, website everyone, everyone contact information of website, company information, IP address information, ICP information.
According to one preferred embodiment of the present invention, described step S3 also includes: the url for described malice network address gives initial weight, the anti-chain factor is set between each url of anti-chain relation for existing, type set factor of influence for website attribute information total between the domain name of website, the span of the described anti-chain factor and factor of influence is interval (0,1);
Between the url of anti-chain url and known malicious network address, the calculating of correlation degree includes: the weights of current url are multiplied by the anti-chain factor, obtains the weights of anti-chain url;
Between website domain name with the url of known malicious network address, the calculating of correlation degree includes: the weights of current url are multiplied by the factor of influence that the type of website domain name website common with current url attribute information is corresponding, obtain the weights of website domain name;
Described correlation degree meets preset requirement: the weights of described anti-chain url or website domain name exceed predetermined threshold value.
According to one preferred embodiment of the present invention, described malice network address database also includes: all url added in queue to be detected or website attribute information corresponding to website domain name and weights.
A kind of recognition methods of malice network address, the method includes:
Obtain url to be detected, whether inquiry malice network address database comprises described url to be detected, if it is, determine that described url to be detected is for malice network address;
Wherein said malice network address database is that the method for building up using described malice network address database is set up.
A kind of recognition methods of malice network address, the method comprises the following steps:
S201, obtain url to be detected, resolve the website attribute information of this url;
S202, utilization resolve the website attribute information obtained, and in malice network address database, lookup and described url to be detected have the malice network address of same alike result information, and described malice network address database is that the method for the foundation using described malice network address database is set up;
The weights of the weight computing url to be detected of the malice network address that S203, utilization find;
S204, judge whether the weights of described url to be detected exceed predetermined threshold value, if it is, described url to be detected to be identified as malice url.
According to one preferred embodiment of the present invention, described step S203 particularly as follows:
The weights of malice network address step S202 found merge calculating, obtain the weights of described url to be detected.
According to one preferred embodiment of the present invention, described joint account is to take maximum, or averages, or summation.
A kind of malice network address database set up device, this device includes:
Site information relating module, in advance each website domain name being associated with corresponding website attribute information, structure the website information association data base;
Anti-chain relating module, builds anti-chain linked database, preserves the linking relationship between each url in advance;
Database module, for obtaining the url of known malicious network address, add in queue to be detected, from queue to be detected, take out url one by one and the current url taken out is supplied to anti-chain detection module or site information detection module, until queue to be detected is empty, and all url added in queue to be detected or website domain name is utilized to build malice network address database;
Anti-chain detection module, for inquiring about described anti-chain linked database, determine all anti-chain url of the current url that described Database module provides, the anti-chain url that the correlation degree between the url of known malicious network address meets preset requirement is added in queue to be detected;
Site information detection module, for resolving the website attribute information of current url, inquire about described site information linked database, determine that the current url provided with described Database module has the website domain name of same site attribute information, the website domain name that the correlation degree between the url of known malicious network address meets preset requirement is added in queue to be detected.
According to one preferred embodiment of the present invention, described website attribute information includes set forth below at least one: website name, website everyone, everyone contact information of website, company information, IP address information, ICP information.
According to one preferred embodiment of the present invention, this device also includes:
Factor setting module, for setting the anti-chain factor between each url of anti-chain relation for existing, and, for the type set factor of influence of website attribute information total between the domain name of website, the span of the described anti-chain factor and factor of influence is interval (0,1);
Described Database module is additionally operable to give initial weight into the url of described malice network address;
The weights of current url are multiplied by the anti-chain factor by described anti-chain detection module respectively, obtain the weights of each anti-chain url, the weights of anti-chain url embody correlation degree between anti-chain url and the url of known malicious network address;
The weights of current url are multiplied by the factor of influence that the type of website domain name website common with current url attribute information is corresponding by described site information detection module respectively, obtain the weights of website domain name, the weights of website domain name embody correlation degree between website domain name and the url of known malicious network address.
According to one preferred embodiment of the present invention, described malice network address database also includes: all url added in queue to be detected or website attribute information corresponding to website domain name and weights.
The identification device of a kind of malice network address, this device includes: inquiry judging module, is used for obtaining url to be detected, whether comprises described url to be detected in inquiry malice network address database, if it is, determine that described url to be detected is for malice network address;
Wherein said malice network address database is that the device of setting up using described malice network address database is set up.
A kind of identification device of malice network address, this device includes:
Parsing module, is used for obtaining url to be detected, resolves the website attribute information of this url;
Enquiry module, the website attribute information obtained is resolved for utilizing, in malice network address database, lookup and described url to be detected have the malice network address of same alike result information, and described malice network address database is that the device of setting up using described malice network address database is set up;
Merge module, for utilizing the weights of the weight computing url to be detected of the malice network address found;
Judge module, for judging whether the weights of described url to be detected exceed predetermined threshold value, if it is, described url to be detected is identified as malice url.
According to one preferred embodiment of the present invention, described merging module concrete configuration is:
The weights of the malice network address found in described enquiry module are merged calculating, obtains the weights of described url to be detected.
According to one preferred embodiment of the present invention, described joint account is to take maximum, or averages, or summation.
As can be seen from the above technical solutions, the method for building up of the malice network address database that the present invention provides, the maliciously recognition methods of network address and device, consider entirely to descend the relatedness between industrial chain, known malicious network address url is extended by the associated data and the linking relationship that utilize website attribute information between Shang Ge website, the Internet, correlation degree based on the url expanded with malice network address url, build malice network address database, the recognition methods realized based on this malice network address database is not based on malicious code feature, there is higher Detection accuracy, and perform also can the network address that have not been put to use be judged without simulation browser environment, improve promptness and the accuracy of detection, minimizing is failed to report.
[accompanying drawing explanation]
The method for building up flow chart of the malice network address database that Fig. 1 provides for the embodiment of the present invention one;
The recognition methods flow chart of the malice network address that Fig. 2 provides for the embodiment of the present invention two;
The malice network address database that Fig. 3 provides for the embodiment of the present invention three set up device schematic diagram;
The identification device schematic diagram of the malice network address that Fig. 4 provides for the embodiment of the present invention four.
[detailed description of the invention]
In order to make the object, technical solutions and advantages of the present invention clearer, describe the present invention with specific embodiment below in conjunction with the accompanying drawings.
Embodiment one,
Fig. 1 is the method for building up flow chart of the malice network address database that the present embodiment provides, as it is shown in figure 1, the method includes:
Step S101, in advance each website domain name is associated with corresponding website attribute information, structure the website information association data base.
One website generally includes many webpages, and each webpage has corresponding network address, and network address generally uses url (uniformresourcelocator, URL) to represent, the generally form of access protocal+domain name.Such as, www.baidu.com includes that all multiple web pages, the url of Baidu's homepage are " http://www.baidu.com ", and domain name is " baidu.com ".Owing to website domain name has uniqueness, thus website domain name can be utilized to represent a website.
For a domain name, utilize the instruments such as whois, the log-on message of this domain name correspondence website can be inquired.Generally log-on message includes website name, the domain name of application, website everyone, everyone contact information (including organization, head of the unit, unit one belongs to's industry, mailing address, postcode, Email, telephone number, fax number and authentication information) of website, the host name of name server and IP address etc..
In the industrial chain of underground, same illegal website operator generally holds multiple malicious websites and forms similar station group, these malicious websites are generally of identical website attribute information, such as, are likely to be of the information such as identical website everyone or identical name server.Utilize the incidence relation between these website attribute informations, find the station group of illegal website operator.
Advancing with the website attribute information of website present on the Internet, structure the website information association data base, in order to inquire about the incidence relation between each website.
Specifically, when structure the website information association data base, first to website present on the Internet by whois instrument, collect those website log-on messages, including website name, website everyone, everyone contact information of website, company information, IP address information etc..Again by methods such as spiders, obtain the ICP (InternetContentProvider of website, Web content service provider) information, including company information, put on record number in website, website name, the information such as website homepage network address, these information are associated with website domain name, form the incidence relation between website domain name and website attribute information, structure the website information association data base.
Described site information linked database can be, but not limited to use the mode of table index to store, incidence relation including website domain name with corresponding website attribute information, wherein website attribute information includes website name, website everyone, everyone contact information of website, company information, IP address information etc..
Step S102, in advance structure anti-chain linked database, preserve the linking relationship between each url.
Potentially including multiple derivation link in one webpage to associate with other webpages, correspondingly, a webpage is also possible in multiple webpages in the way of importing link associate.
Anti-chain, i.e. imports link, refers to by one section of source word or path, one url is incorporated into the link in their webpage in other webpages.The network address of every importing link comprising this url in webpage is all the anti-chain url of this url.
Utilize the linking relationship between the url that these webpages are corresponding, build anti-chain linked database.Use the methods such as existing web crawlers (webcrawler) to crawl web page contents, preserve the linking relationship between each url, build and obtain anti-chain linked database, in order to the anti-chain of follow-up lookup url.
Step S103, set different factors of influence for different incidence relation.
Two websites associate, and refer to that the two website has identical website attribute information.Different incidence relations refers to that each web-site occurs the website attribute information type difference of association between any two.Owing to the type that the website attribute information of association occurs between website is different, the correlation degree between website is the most identical.Such as, using the website of identical email address registration substantially to may determine that as same registrant, identical ip addresses then represents shares host ip between website.
According to the type of website attribute information, set different factors of influence for different incidence relations.The all types of factors of influence preset are that the type according to website attribute information total between the domain name of website is arranged.Such as, for using the website of identical email address registration to set the email factor, for fixed value 0.9, for using the website of identical ip addresses to set the IP factor, for fixed value 0.8, the website for anti-chain relation sets the anti-chain factor, for fixed value 0.8.For the type set factor of influence of website attribute information total between the domain name of website, for existing, between each url of anti-chain relation, the anti-chain factor is set.
All types of factors of influence include the factor of influence of each website attribute information types such as the anti-chain factor, the email factor, the IP factor, the registered user name factor, company incorporated's factor, the ICP factor.Those different types of factor of influence α can be, but not limited to be set according to existing empirical data, wherein 0 < α < 1.
Step S104, the url of acquisition known malicious network address, add in queue to be detected, takes out url one by one and the current url taken out is performed step S105 respectively from queue to be detected.
Known malicious network address can be to be the network address that determines of the modes such as malicious websites monitoring technology by updating existing antivirus software or every day.Using those malice network address as input, give initial weight for known malicious network address, and add in queue to be detected.Now, queue to be detected includes each malice network address and the initial weight of each malice network address.
Take out one by one for the network address (url) in queue to be detected and detect, the current url taken out is performed step S105.
Step S105, inquire about described anti-chain linked database, determine all anti-chain url of current url, the anti-chain url that the correlation degree between the url of known malicious network address meets preset requirement is added in queue to be detected.
Between the url of anti-chain url and known malicious network address, the calculating of correlation degree includes: by the weights of current url and anti-chain fac-tor, obtain the weights of each anti-chain url.
In this step, the anti-chain url retrieved and current url is anti-chain relation, thus, the factor of influence of employing is the anti-chain factor.
For the malice network address detected, the weights of employing are the initial weight of malice network address, are 1.The initial weight of utilization malice network address and the anti-chain factor obtain the weights of each anti-chain url.If the anti-chain factor set is as 0.8, the weights of the most each anti-chain url are 0.8*1=0.8.
Described correlation degree meets preset requirement: the weights of described anti-chain url exceed predetermined threshold value.The anti-chain url that weights exceed predetermined threshold value adds in queue to be detected.Described predetermined threshold value can be set according to practical experience, and such as, arranging predetermined threshold value is 0.7, then the weights anti-chain url more than 0.7 and corresponding weights are added in queue to be detected.
Step S106, resolve the website attribute information of current url, inquire about described site information linked database, determine, with current url, there is the website domain name of same site attribute information, the website domain name that the correlation degree between the url of known malicious network address meets preset requirement is added in queue to be detected.
Between website domain name with the url of known malicious network address, the calculating of correlation degree includes: the weights of current url are multiplied by the factor of influence that the type of website domain name website common with current url attribute information is corresponding, obtain the weights of website domain name.
Described correlation degree meets preset requirement: the weights of described website domain name exceed predetermined threshold value.
Specifically, first corresponding factor of influence is determined according to the type of website attribute information total between each website domain name with current url.Being multiplied with each corresponding factor of influence by the weights of current url, obtain the weights of each website domain name, the website domain name that weights exceed predetermined threshold value is added in queue to be detected.
Extract website domain name corresponding for current url, utilize whois tool queries, obtain website attribute information corresponding for current url, including website name, website everyone, everyone email of website, exabyte, No. ICP etc., these website attribute informations are utilized to mate in site information linked database, inquire the website domain name with same alike result, and record the website attribute information type that those website domain names associate with current url, in order to determine each factor of influence.
Each factor of influence refers to the factor of influence that each website domain name is corresponding with the website attribute information type of current url generation association.Such as, website domain name A and current url have identical email address, then the weights of this website domain name A are the weights product with the email factor of current url.If website domain name B and current url have identical IP address, then the weights of this website domain name B are the weights product with the IP factor of current url.The like, it is calculated the weights of each website domain name.
If the factor of influence that website domain name associates with current url is multiple, such as, when there is identical email address with identical registered user name, it is determined that the maximum of the two factor of influence can be selected during factor of influence to be used as the factor of influence of website domain name and current url.Or, it is also possible to distribute different weights for different website attribute informations, but summation is 1, identical if there is multiple website attribute informations, then coefficient corresponding for each website attribute information is weighted, determines factor of influence.
The website domain name that weights exceed predetermined threshold value is added in queue to be detected.Described predetermined threshold value is identical with step S105.
It is noted that the sequencing of described step S105 and step S106 can be exchanged, it is also possible to detect only with a kind of mode therein.
Step S107, from queue to be detected, take out next url or website domain name, repeat step S105 and step S106, until described queue to be detected is empty, the website attribute information of all url occurred in queue to be detected or website domain name and correspondence is utilized to build malice network address database.
Owing to website domain name is the special case of url, in url storehouse, what website domain name was pointed to is the homepage of this website.Thus, website domain name can change into website homepage url, is used uniformly across url and is indicated in malice network address database.
Due to the factor of influence 0 < α < 1 arranged, after constantly repeating, the weights of calculated url can be more and more less, it is in convergence process, when the weights of all url are respectively less than predetermined threshold value, when the most no longer increasing queue to be detected and queue to be detected newly for sky, collect the closure of the suspected site obtaining a collection of association.
Utilize these all url occurred in queue to be detected or website attribute information corresponding to website domain name, those url or website domain name and weights, be saved in data base, build malice network address database, form a underground industry data database.Maliciously network address database can be, but not limited to use the mode of table index to store, including the url information collected, email address information, domain name (domain) information, ICP information, IP address information etc..
Give an example, if the known malicious network address obtained has url1, then by those malice network address imparting initial weights, for example, 1, add in queue to be detected.Take out a url, as url1 is analyzed as current url.
Utilize url1 to find out all anti-chain urls corresponding with this malice network address url1 in anti-chain linked database, such as, potentially include url2, url3.Utilizing the weights (i.e. initial weight) of this malice network address url1 and the anti-chain fac-tor set, as the weights of anti-chain url2 and url3, the anti-chain factor such as set is as 0.8, then the weights of url2 and url3 are 0.8*1=0.8.The anti-chain url that weights exceed predetermined threshold value adds in queue to be detected, if predetermined threshold value is 0.7, is then all added in queue to be detected by url2 and url3.
Extract from url1 and obtain corresponding domain name, for example, www.xxx123.com, utilizes the tool queries such as whois to obtain website attribute information corresponding for this url1, including website name, website everyone, everyone emai of website, exabyte, IP address, No. ICP etc., utilize these website attribute informations to mate in site information linked database, inquire the website domain name with same alike result, such as, have the domain name 1 of identical email address and have the domain name 2 of identical ip addresses.Calculate domain name 1 and the weights of domain name 2, if the email factor set is as 0.9, the IP factor is 0.8, then the weights of domain name 1 are the product of initial weight and the email factor: 0.9*1=0.9, and the weights of domain name 2 are the product of initial weight and the IP factor: 0.8*1=0.8.Owing to the weights of domain name 1 and domain name 2 are also above predetermined threshold value 0.7, the most also domain name 1 and domain name 2 are added in queue to be detected.
Take out next url or website domain name, it is assumed that take out url2, carry out duplicate detection.
Utilize url2 to find out all anti-chain urls corresponding with url2 in anti-chain linked database, such as, potentially include url4, url5.Utilizing the weights of this url2 to be multiplied with the anti-chain factor 0.8 of setting, as the weights of anti-chain url4 and url5, then the weights of url4 and url5 are 0.8*0.8=0.64.Owing to the weights of url4 and url5 are respectively less than predetermined threshold value 0.7, the most all it is not added in queue to be detected.
Extract from url2 and obtain corresponding domain name, the tool queries such as whois are utilized to obtain website attribute information corresponding for this url2, these website attribute informations are utilized to mate in site information linked database, inquire the website domain name with same alike result, such as, have the domain name 3 of identical email address and have the domain name 4 of identical company incorporated.The weights being calculated domain name 3 are 0.8*0.9=0.72, if the company incorporated's factor set is as 0.8, then the weights of domain name 4 are 0.8*0.8=0.64.Owing to domain name 3 exceedes predetermined threshold value 0.7, the most also domain name 3 is added in queue to be detected, and domain name 4 is less than predetermined threshold value 0.7, then without.
The rest may be inferred, repeats step S105 and S106, until queue to be detected is empty, obtains the information about url1, url2, url3, domain name 1, domain name 2 and domain name 3 etc. and the weights of correspondence, builds malice network address database.
Utilize the malice network address database built, the unknown url the most maliciously can be detected.A kind of mode, can directly obtain url to be detected, whether comprise this url to be detected in inquiry malice network address database, if it is, determine that described url to be detected is for malice network address.And for the url that directly cannot find in malice network address database, it is possible to use comprise the record of relevant information, be identified.Below by embodiment two, the recognition methods of the malice network address that the present invention provides is illustrated.
Embodiment two,
Fig. 2 is the recognition methods flow chart of the malice network address that the present embodiment provides, as in figure 2 it is shown, the method includes:
Step S201, obtain url to be detected, resolve the website attribute information of this url to be detected.
For url to be detected, extract corresponding domain name, utilize the tool queries such as whois to obtain the website attribute information of this url to be detected, including website name, website everyone, everyone email of website, exabyte, IP address, the information such as No. ICP.
Step S202, utilization resolve the website attribute information obtained, and in malice network address database, lookup and described url to be detected have the malice network address of same alike result information, and described malice network address database is to use the method foundation as described in embodiment one.
In the malice network address database that embodiment one builds, utilize the website attribute information of url to be detected, extract malice url comprising those website attribute informations, obtain a collection of malice url being associated with this url to be detected.
The weights of the weight computing url to be detected of the malice network address that step S203, utilization find.
The weights of malice network address step S202 found merge calculating, obtain the weights of described url to be detected.Described joint account can be to take maximum, or averages, or the mode such as summation.Preferably, weights corresponding for malice url found will be chosen the maximum weights as described url to be detected.
For repeating malice url repeatedly, tune power process can also be carried out when merging calculating, increase a default tune weight factor.When a url is through being judged as suspicious url from different data sources, represent that the suspicion degree that this url is malice network address is the highest.
Step S204, judge whether the weights of described url to be detected exceed predetermined threshold value, if it is, described url to be detected to be identified as malice url.
Described predetermined threshold value can be identical with step S105 in embodiment one and step S106, it is also possible to separately sets a fixed value.
Thus, for unknown url, it is possible to use the malice network address database established carries out determining whether malice network address.
It is above the detailed description that method provided by the present invention is carried out, below the identification device setting up device and malice network address of the malice network address database that the present invention provides is described in detail.
Embodiment three
Fig. 3 be the present embodiment provide malice network address database set up device schematic diagram.As it is shown on figure 3, this device includes:
Site information relating module 301, in advance each website domain name being associated with corresponding website attribute information, structure the website information association data base.
Site information relating module 301 advances with the website attribute information of website present on the Internet, and structure the website information association data base, in order to inquire about the incidence relation between each website.
Specifically, when structure the website information association data base, first to website present on the Internet by whois instrument, collect those website log-on messages, including website name, website everyone, everyone contact information of website, company information, IP address information etc..Again by methods such as spiders, obtain the ICP (InternetContentProvider of website, Web content service provider) information, including company information, put on record number in website, website name, the information such as website homepage network address, these information are associated with website domain name, form the incidence relation between website domain name and website attribute information, structure the website information association data base.
Described site information linked database can be, but not limited to use the mode of table index to store, incidence relation including website domain name with corresponding website attribute information, wherein website attribute information includes website name, website everyone, everyone contact information of website, company information, IP address information etc..
Anti-chain relating module 302, builds anti-chain linked database, preserves the linking relationship between each url in advance.
Potentially including multiple derivation link in one webpage to associate with other webpages, correspondingly, a webpage is also possible in multiple webpages in the way of importing link associate.
Anti-chain, i.e. imports link, refers to by one section of source word or path, one url is incorporated into the link in their webpage in other webpages.The network address of every importing link comprising this url in webpage is all the anti-chain url of this url.
Anti-chain relating module 302 utilizes the linking relationship between the url that these webpages are corresponding, builds anti-chain linked database.Use the methods such as existing web crawlers (webcrawler) to crawl web page contents, preserve the linking relationship between each url, build and obtain anti-chain linked database, in order to the anti-chain of follow-up lookup url.
Factor setting module 303, for being the setting anti-chain factor between each url that there is anti-chain relation, and, for the type set factor of influence of website attribute information total between the domain name of website.
The span of the described anti-chain factor and factor of influence is interval (0,1).
Factor setting module 303, according to the type of website attribute information, sets different factors of influence for different incidence relations.The all types of factors of influence preset are that the type according to website attribute information total between the domain name of website is arranged.Such as, for using the website of identical email address registration to set the email factor, for fixed value 0.9, for using the website of identical ip addresses to set the IP factor, for fixed value 0.8, the website for anti-chain relation sets the anti-chain factor, for fixed value 0.8.According to the factor of influence that the type set of website attribute information total between website domain name is corresponding, for existing, between each url of anti-chain relation, the anti-chain factor is set.
All types of factors of influence include the factor of influence of each website attribute information types such as the anti-chain factor, the email factor, the IP factor, the registered user name factor, company incorporated's factor, the ICP factor.Those different types of factor of influence α can be, but not limited to be set according to existing empirical data, wherein 0 < α < 1.
Database module 304, for obtaining the url of known malicious network address, add in queue to be detected, from queue to be detected, take out url one by one and the current url taken out is supplied to anti-chain detection module 305 or site information detection module 306, until queue to be detected is empty, and all url added in queue to be detected or website domain name is utilized to build malice network address database.
Known malicious network address can be to be the network address that determines of the modes such as malicious websites monitoring technology by updating existing antivirus software or every day.Using those malice network address as input, give initial weight for known malicious network address, and add in queue to be detected.Now, queue to be detected includes each malice network address and the initial weight of each malice network address.
Take out one by one for the network address (url) in queue to be detected, utilize anti-chain detection module 305 or site information detection module 306 to detect.
Anti-chain detection module 305, for inquiring about described anti-chain linked database, determine all anti-chain url of the current url that Database module 304 provides, the anti-chain url that the correlation degree between the url of known malicious network address meets preset requirement is added in queue to be detected.
Anti-chain detection module 305, by the weights of current url and anti-chain fac-tor, obtains the weights of each anti-chain url, the weights of anti-chain url embody correlation degree between anti-chain url and the url of known malicious network address.The anti-chain url that weights exceed predetermined threshold value adds in queue to be detected.
For the malice network address detected, the weights of employing are the initial weight of malice network address, are 1.The initial weight of utilization malice network address and the anti-chain factor obtain the weights of each anti-chain url.If the anti-chain factor set is as 0.8, the weights of the most each anti-chain url are 0.8*1=0.8.
Weights are exceeded the anti-chain url of predetermined threshold value by anti-chain detection module 305 to be added in queue to be detected.Described predetermined threshold value can be set according to practical experience, and such as, arranging predetermined threshold value is 0.7, then the weights anti-chain url more than 0.7 and corresponding weights are added in queue to be detected.
Site information detection module 306, for resolving the website attribute information of current url, inquire about described site information linked database, determine that the current url provided with Database module 304 has the website domain name of same site attribute information, the website domain name that the correlation degree between the url of known malicious network address meets preset requirement is added in queue to be detected.
Site information detection module 306 first determines corresponding factor of influence according to the type of website attribute information total between each website domain name with current url.The weights of current url are multiplied by respectively the factor of influence that the type of website domain name website common with current url attribute information is corresponding, obtain the weights of website domain name, the weights of website domain name embody correlation degree between website domain name and the url of known malicious network address.The website domain name that weights exceed predetermined threshold value is added in queue to be detected.
Extract website domain name corresponding for current url, utilize whois tool queries, obtain website attribute information corresponding for current url, including website name, website everyone, everyone email of website, exabyte, No. ICP etc., these website attribute informations are utilized to mate in site information linked database, inquire the website domain name with same alike result, and record the website attribute information type that those website domain names associate with current url, in order to determine each factor of influence.
Each factor of influence refers to the factor of influence that each website domain name is corresponding with the website attribute information type of current url generation association.Such as, website domain name A and current url have identical email address, then the weights of this website domain name A are the weights product with the email factor of current url.If website domain name B and current url have identical IP address, then the weights of this website domain name B are the weights product with the IP factor of current url.The like, it is calculated the weights of each website domain name.
If the factor of influence that website domain name associates with current url is multiple, such as, when there is identical email address with identical registered user name, it is determined that the maximum of the two factor of influence can be selected during factor of influence to be used as the factor of influence of website domain name and current url.Or, it is also possible to distribute different weights for different website attribute informations, but summation is 1, identical if there is multiple website attribute informations, then coefficient corresponding for each website attribute information is weighted, determines factor of influence.The website domain name that weights exceed predetermined threshold value is added in queue to be detected.
Then, Database module 304 is taken out url one by one from queue to be detected and the current url taken out is triggered anti-chain detection module 305 or site information detection module 306, until queue to be detected is empty, and all url added in queue to be detected or website domain name is utilized to build malice network address database.
Owing to website domain name is the special case of url, in url storehouse, what website domain name was pointed to is the homepage of this website.Thus, website domain name can change into website homepage url, is used uniformly across url and is indicated in malice network address database.
Due to the factor of influence 0 < α < 1 arranged, after constantly repeating, the weights of calculated url can be more and more less, it is in convergence process, when the weights of all url are respectively less than predetermined threshold value, when the most no longer increasing queue to be detected and queue to be detected newly for sky, collect the closure of the suspected site obtaining a collection of association.
Utilize these all url occurred in queue to be detected or website attribute information corresponding to website domain name, those url or website domain name and weights, be saved in data base, build malice network address database, form a underground industry data database.Maliciously network address database can be, but not limited to use the mode of table index to store, including the url information collected, email address information, domain name (domain) information, ICP information, IP address information etc..
Utilize the malice network address database built, the unknown url the most maliciously can be detected.A kind of identification device may include that inquiry judging module, directly obtains url to be detected, whether comprises this url to be detected in inquiry malice network address database, if it is, determine that described url to be detected is for maliciously network address.And for the url that directly cannot find in malice network address database, it is possible to use comprise the record of relevant information, be identified.Below by embodiment four, the identification device of the malice network address that the present invention provides is illustrated.
Fig. 4 is the identification device schematic diagram of the malice network address that the present embodiment provides.As shown in Figure 4, this device includes:
Parsing module 401, is used for obtaining url to be detected, resolves the website attribute information of this url.
For url to be detected, parsing module 401 extracts the domain name of correspondence, utilizes the tool queries such as whois to obtain the website attribute information of this url to be detected, including website name, website everyone, everyone email of website, exabyte, IP address, the information such as No. ICP.
Enquiry module 402, resolves, for utilizing, the website attribute information obtained, and in malice network address database, lookup and described url to be detected have the malice network address of same alike result information, and described malice network address database is to use the device described in embodiment three to set up.
Enquiry module 402 utilizes the website attribute information of url to be detected, extracts malice url comprising those website attribute informations, and inquiry obtains a collection of malice url being associated with this url to be detected.
Merge module 403, be used for the weights of the weight computing url to be detected of the malice network address utilizing enquiry module 402 to find.
The weights of the malice network address found by enquiry module 402 merge calculating, obtain the weights of described url to be detected.Described joint account can be to take maximum, or averages, or the mode such as summation.Preferably, weights corresponding for malice url found will be chosen the maximum weights as described url to be detected.
For repeating malice url repeatedly, tune power process can also be carried out when merging calculating, increase a default tune weight factor.When a url is through being judged as suspicious url from different data sources, represent that the suspicion degree that this url is malice network address is the highest.
Judge module 404, for judging whether the weights of described url to be detected exceed predetermined threshold value, if it is, described url to be detected is identified as malice url.
For unknown url, it is possible to use the malice network address database established carries out determining whether malice network address.
The recognition methods of the method for building up of malice network address database of present invention offer, maliciously network address and device, consider entirely to descend the relatedness between industrial chain, utilize the associated data of website attribute information between Shang Ge website, the Internet, build malice network address database, it is performed without also unknown network address being judged, improve promptness and the accuracy of detection, minimizing is failed to report.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. done, within should be included in the scope of protection of the invention.

Claims (16)

1. the method for building up of a malice network address database, it is characterised in that the method includes:
S1, in advance each website domain name is associated with corresponding website attribute information, structure the website information association data base;
S2, in advance structure anti-chain linked database, preserve the linking relationship between each url;
S3, the url of acquisition known malicious network address, add in queue to be detected, from queue to be detected, take out url one by one and the current url taken out is performed step S4 respectively, until queue to be detected is empty, and all url added in queue to be detected or website domain name is utilized to build malice network address database;
S4, inquiring about described anti-chain linked database, determine all anti-chain url of current url, the anti-chain url that weights exceed predetermined threshold value adds in queue to be detected, and wherein the weights of anti-chain url are multiplied by the anti-chain factor by the weights of current url and obtain;Or,
Resolve the website attribute information of current url, inquire about described site information linked database, determine, with current url, there is the website domain name of same site attribute information, the website domain name that weights exceed predetermined threshold value is added in queue to be detected, and the factor of influence that wherein weights of website domain name are multiplied by the type of website domain name website common with current url attribute information corresponding by the weights of current url obtains.
Method the most according to claim 1, it is characterised in that described website attribute information includes set forth below at least one: website name, website everyone, everyone contact information of website, company information, IP address information, ICP information.
Method the most according to claim 1, it is characterized in that, described step S3 also includes: the url for described malice network address gives initial weight, the anti-chain factor is set between each url of anti-chain relation for existing, type set factor of influence for website attribute information total between the domain name of website, the span of the described anti-chain factor and factor of influence is interval (0,1);
If wherein comprising the importing link of another url in the webpage of a url, then there is anti-chain relation in the two url.
Method the most according to claim 3, it is characterised in that described malice network address database also includes: all url added in queue to be detected or website attribute information corresponding to website domain name and weights.
5. the recognition methods of a malice network address, it is characterised in that the method includes:
Obtain url to be detected, whether inquiry malice network address database comprises described url to be detected, if it is, determine that described url to be detected is for malice network address;
Wherein said malice network address database is to use the method as described in claim as arbitrary in Claims 1-4 to set up.
6. the recognition methods of a malice network address, it is characterised in that the method includes:
S201, obtain url to be detected, resolve the website attribute information of this url;
S202, utilization resolve the website attribute information obtained, and in malice network address database, lookup and described url to be detected have the malice network address of same alike result information, and described malice network address database is to use method as claimed in claim 4 foundation;
The weights of the weight computing url to be detected of the malice network address that S203, utilization find;
S204, judge whether the weights of described url to be detected exceed predetermined threshold value, if it is, described url to be detected to be identified as malice url.
Method the most according to claim 6, it is characterised in that described step S203 particularly as follows:
The weights of malice network address step S202 found merge calculating, obtain the weights of described url to be detected.
Method the most according to claim 7, it is characterised in that described joint account is to take maximum, or averages, or summation.
9. one kind malice network address database set up device, it is characterised in that this device includes:
Site information relating module, in advance each website domain name being associated with corresponding website attribute information, structure the website information association data base;
Anti-chain relating module, builds anti-chain linked database, preserves the linking relationship between each url in advance;
Database module, for obtaining the url of known malicious network address, add in queue to be detected, from queue to be detected, take out url one by one and the current url taken out is supplied to anti-chain detection module or site information detection module, until queue to be detected is empty, and all url added in queue to be detected or website domain name is utilized to build malice network address database;
Anti-chain detection module, for inquiring about described anti-chain linked database, determine all anti-chain url of the current url that described Database module provides, the anti-chain url that weights exceed predetermined threshold value adds in queue to be detected, and wherein the weights of anti-chain url are multiplied by the anti-chain factor by the weights of current url and obtain;
Site information detection module, for resolving the website attribute information of current url, inquire about described site information linked database, determine that the current url provided with described Database module has the website domain name of same site attribute information, the website domain name that weights exceed predetermined threshold value is added in queue to be detected, and the factor of influence that wherein weights of website domain name are multiplied by the type of website domain name website common with current url attribute information corresponding by the weights of current url obtains.
Device the most according to claim 9, it is characterised in that described website attribute information includes set forth below at least one: website name, website everyone, everyone contact information of website, company information, IP address information, ICP information.
11. devices according to claim 9, it is characterised in that this device also includes:
Factor setting module, for setting the anti-chain factor between each url of anti-chain relation for existing, and, for the type set factor of influence of website attribute information total between the domain name of website, the span of the described anti-chain factor and factor of influence is interval (0,1);
Described Database module is additionally operable to give initial weight into the url of described malice network address;
If wherein comprising the importing link of another url in the webpage of a url, then there is anti-chain relation in the two url.
12. devices according to claim 11, it is characterised in that described malice network address database also includes: all url added in queue to be detected or website attribute information corresponding to website domain name and weights.
The identification device of 13. 1 kinds of malice network address, it is characterised in that this device includes: inquiry judging module, is used for obtaining url to be detected, whether comprises described url to be detected in inquiry malice network address database, if it is, determine that described url to be detected is for malice network address;
Wherein said malice network address database is to use the device as described in claim as arbitrary in claim 9 to 12 to set up.
The identification device of 14. 1 kinds of malice network address, it is characterised in that this device includes:
Parsing module, is used for obtaining url to be detected, resolves the website attribute information of this url;
Enquiry module, resolves, for utilizing, the website attribute information obtained, and in malice network address database, lookup and described url to be detected have the malice network address of same alike result information, and described malice network address database is to use device as claimed in claim 12 to set up;
Merge module, for utilizing the weights of the weight computing url to be detected of the malice network address found;
Judge module, for judging whether the weights of described url to be detected exceed predetermined threshold value, if it is, described url to be detected is identified as malice url.
15. devices according to claim 14, it is characterised in that described merging module concrete configuration is:
The weights of the malice network address found in described enquiry module are merged calculating, obtains the weights of described url to be detected.
16. devices according to claim 15, it is characterised in that described joint account is to take maximum, or averages, or summation.
CN201210069443.7A 2012-03-15 2012-03-15 The maliciously recognition methods of the method for building up of network address database, maliciously network address and device Active CN102663000B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210069443.7A CN102663000B (en) 2012-03-15 2012-03-15 The maliciously recognition methods of the method for building up of network address database, maliciously network address and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210069443.7A CN102663000B (en) 2012-03-15 2012-03-15 The maliciously recognition methods of the method for building up of network address database, maliciously network address and device

Publications (2)

Publication Number Publication Date
CN102663000A CN102663000A (en) 2012-09-12
CN102663000B true CN102663000B (en) 2016-08-03

Family

ID=46772491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210069443.7A Active CN102663000B (en) 2012-03-15 2012-03-15 The maliciously recognition methods of the method for building up of network address database, maliciously network address and device

Country Status (1)

Country Link
CN (1) CN102663000B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778113B (en) * 2012-10-17 2017-04-19 腾讯科技(深圳)有限公司 Terminal and server and webpage processing method of terminal and server
CN102945349B (en) * 2012-10-19 2016-06-22 北京奇虎科技有限公司 unknown file processing method and device
CN103036896B (en) * 2012-12-20 2015-07-01 北京奇虎科技有限公司 Method and system for testing malicious links
WO2014094653A1 (en) * 2012-12-20 2014-06-26 北京奇虎科技有限公司 Device, method and system for detecting malicious links
CN104615695B (en) * 2015-01-23 2018-10-09 腾讯科技(深圳)有限公司 A kind of detection method and system of malice network address
CN104980446A (en) * 2015-06-30 2015-10-14 百度在线网络技术(北京)有限公司 Detection method and system for malicious behavior
CN105335480A (en) * 2015-10-13 2016-02-17 国家电网公司 Internet website liability subject identifying method
CN105956472B (en) * 2016-05-12 2019-10-18 宝利九章(北京)数据技术有限公司 Identify webpage in whether include hostile content method and system
CN107463583A (en) * 2016-06-06 2017-12-12 广州泰尔智信科技有限公司 Application developer region determines method and apparatus
CN107517193A (en) * 2016-06-17 2017-12-26 百度在线网络技术(北京)有限公司 Malicious websites recognition methods and device
CN106992967A (en) * 2017-02-28 2017-07-28 北京瑞星信息技术股份有限公司 Malicious websites recognition methods and system
CN109391583B (en) * 2017-08-03 2021-06-25 武汉安天信息技术有限责任公司 Attacker tracing method and system based on malicious application
CN108092963B (en) * 2017-12-08 2020-05-08 平安科技(深圳)有限公司 Webpage identification method and device, computer equipment and storage medium
CN108062413B (en) * 2017-12-30 2019-05-28 平安科技(深圳)有限公司 Web data processing method, device, computer equipment and storage medium
CN109063106A (en) * 2018-07-27 2018-12-21 北京字节跳动网络技术有限公司 Network address modification method, device, computer equipment and storage medium
CN110012030A (en) * 2019-04-23 2019-07-12 北京微步在线科技有限公司 A kind of method and device of association detection hacker

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5732264A (en) * 1994-11-08 1998-03-24 Matsushita Electric Industrial Co., Ltd. Information management system and method for managing, processing storing and displaying attribute information of object information
CN101547197A (en) * 2009-04-30 2009-09-30 珠海金山软件股份有限公司 A URL washing device and a washing method
CN102045360A (en) * 2010-12-27 2011-05-04 成都市华为赛门铁克科技有限公司 Method and device for processing baleful website library
CN102045358A (en) * 2010-12-29 2011-05-04 深圳市永达电子股份有限公司 Intrusion detection method based on integral correlation analysis and hierarchical clustering
CN102096683A (en) * 2009-12-11 2011-06-15 奇智软件(北京)有限公司 Method for realizing nameplate at browser address bar

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5732264A (en) * 1994-11-08 1998-03-24 Matsushita Electric Industrial Co., Ltd. Information management system and method for managing, processing storing and displaying attribute information of object information
CN101547197A (en) * 2009-04-30 2009-09-30 珠海金山软件股份有限公司 A URL washing device and a washing method
CN102096683A (en) * 2009-12-11 2011-06-15 奇智软件(北京)有限公司 Method for realizing nameplate at browser address bar
CN102045360A (en) * 2010-12-27 2011-05-04 成都市华为赛门铁克科技有限公司 Method and device for processing baleful website library
CN102045358A (en) * 2010-12-29 2011-05-04 深圳市永达电子股份有限公司 Intrusion detection method based on integral correlation analysis and hierarchical clustering

Also Published As

Publication number Publication date
CN102663000A (en) 2012-09-12

Similar Documents

Publication Publication Date Title
CN102663000B (en) The maliciously recognition methods of the method for building up of network address database, maliciously network address and device
US9218482B2 (en) Method and device for detecting phishing web page
CN103778151B (en) The method and device and searching method and device of a kind of identification feature colony
CN102724187B (en) A kind of safety detection method for network address and device
CN102739653B (en) Detection method and device aiming at webpage address
CN102833258A (en) Website access method and system
CN103559235A (en) Online social network malicious webpage detection and identification method
CN102710795B (en) Hotspot collecting method and device
CN105760379B (en) Method and device for detecting webshell page based on intra-domain page association relation
CN104579773A (en) Domain name system analysis method and device
CN107341399B (en) Method and device for evaluating security of code file
CN105049301A (en) Method and device for providing comprehensive evaluation services of websites
CN107437026B (en) Malicious webpage advertisement detection method based on advertisement network topology
CN102868773A (en) Method, device and system for detecting domain name system (DNS) black hole hijack
CN106095979B (en) URL merging processing method and device
CN103220277B (en) The monitoring method of cross-site scripting attack, Apparatus and system
Cui et al. Malicious URL detection with feature extraction based on machine learning
CN104239582A (en) Method and device for identifying phishing webpage based on feature vector model
CN104717226A (en) Method and device for detecting website address
Sardar et al. Detection and confirmation of web robot requests for cleaning the voluminous web log data
CN105187439A (en) Phishing website detection method and device
CN103793508B (en) A kind of loading recommendation information, the methods, devices and systems of network address detection
CN103810268B (en) Search result recommendation information loading method, device and system and URL detection method, device and system
CN105138912A (en) Method and device for generating phishing website detection rules automatically
EP3398311B1 (en) Method and system for preserving privacy in an http communication between a client and a server

Legal Events

Date Code Title Description
PB01 Publication
C06 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
GR01 Patent grant
C14 Grant of patent or utility model