CN108270754B

CN108270754B - Detection method and device for phishing website

Info

Publication number: CN108270754B
Application number: CN201710001394.6A
Authority: CN
Inventors: 郭智慧
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Priority date: 2017-01-03
Filing date: 2017-01-03
Publication date: 2021-08-06
Anticipated expiration: 2037-01-03
Also published as: CN108270754A

Abstract

The invention provides a method and a device for detecting a phishing website, relates to the technical field of network security, and aims to improve the timeliness of system processing. The detection method of the phishing website comprises the following steps: acquiring first domain name information of a website to be detected and second domain name information of a target website; if the first domain name information is similar to the second domain name information, taking the website to be detected as a website to be confirmed; acquiring first website page content identification information of the website to be confirmed and second website page content identification information of a target website; and if the first website page content identification information is the same as the second website page content identification information, determining that the website to be confirmed is a phishing website. The invention is mainly used in the phishing website detection technology.

Description

Detection method and device for phishing website

Technical Field

The invention relates to the technical field of network security, in particular to a method and a device for detecting a phishing website.

Background

Due to the nature of the internet, the distribution of sources and hazards of phishing websites across national borders has become a global concern and troublesome. At present, global phishing websites frequently appear, which causes great harm to public interests, reduces the confidence of the public in using the internet and seriously influences the development of online financial services and electronic commerce.

The phishing website is a network fraud that lawless persons use various means to copy URL (Uniform Resource Locator) addresses and page contents of real websites, or use bugs on server programs of real websites to insert dangerous HTML (hypertext markup Language) codes into some webpages of websites to cheat private data such as user banks or credit card accounts and passwords.

In the prior art, various phishing website detection methods exist. However, the implementation design of these methods is complex, which results in the system not processing phishing websites timely.

Disclosure of Invention

In view of this, the present invention provides a method and an apparatus for detecting phishing websites, so as to improve the timeliness of system processing.

In order to solve the above technical problem, the present invention provides a method for detecting a phishing website, comprising:

acquiring first domain name information of a website to be detected and second domain name information of a target website;

if the first domain name information is similar to the second domain name information, taking the website to be detected as a website to be confirmed;

acquiring first website page content identification information of the website to be confirmed and second website page content identification information of a target website;

and if the first website page content identification information is the same as the second website page content identification information, determining that the website to be confirmed is a phishing website.

Wherein, if the first website page content identification information is the same as the second website page content identification information, the step of determining that the website to be confirmed is a phishing website comprises:

acquiring the IP attribution information of the website to be confirmed and the IP attribution information of the target website;

if the IP attribution information of the website to be confirmed is inconsistent with the IP attribution information of the target website, determining that the website to be confirmed is a phishing website;

and if the IP attribution information of the website to be confirmed is consistent with the IP attribution information of the target website, acquiring link ratio information of the website to be confirmed, and determining that the website to be confirmed is a phishing website when the link ratio information meets a preset condition.

Wherein, if the IP attribution information of the website to be confirmed is consistent with the IP attribution information of the target website, the step of obtaining the link ratio information of the website to be confirmed and determining the website to be confirmed to be a phishing website when determining that the link ratio information meets the preset condition comprises the following steps:

acquiring link ratio information of the website to be confirmed, wherein the link ratio information comprises the ratio of the website to be confirmed to be linked to the target website;

and if the ratio of the website to be confirmed to be linked to the target website is greater than or equal to a first preset value, determining that the website to be confirmed is a phishing website.

Wherein the link ratio information further includes an abnormal link ratio of the website to be confirmed;

the step of acquiring link ratio information of the website to be confirmed if the IP attribution information of the website to be confirmed is consistent with the IP attribution information of the target website, and determining that the website to be confirmed is a phishing website when the link ratio information is determined to meet a preset condition, further includes:

if the ratio of the website to be confirmed to be linked to the target website is smaller than the first preset value, determining whether the sum of the ratio of the website to be confirmed to be linked to the target website and the abnormal link ratio is larger than or equal to a second preset value;

and if the sum of the ratio of the website to be confirmed to be linked to the target website and the abnormal link ratio is greater than or equal to the second preset value, determining that the website to be confirmed is a phishing website.

Before the step of obtaining the link ratio information of the website to be confirmed, the method further comprises:

detecting the webpage content of the website to be confirmed;

the acquiring of the link ratio information of the website to be confirmed specifically includes:

and when the website to be confirmed is detected to comprise preset content, acquiring link ratio information of the website to be confirmed.

The ratio of the website to be confirmed to be linked to the target website is the quotient of the number of links of the website to be confirmed to the target website and the total number of hyperlinks of the website to be confirmed;

the abnormal link ratio is the quotient of the abnormal link times of the website to be confirmed and the total number of hyperlinks of the website to be confirmed.

The identification information of the first website page content of the website to be confirmed is ICP number information of the website to be confirmed; and the second website page content identification information of the target website is ICP number information of the target website.

In a second aspect, the present invention provides a phishing website detection device, including:

the first information acquisition module is used for acquiring first domain name information of a website to be detected and second domain name information of a target website;

the domain name detection module is used for taking the website to be detected as a website to be confirmed if the first domain name information is similar to the second domain name information;

the second information acquisition module is used for acquiring the first website page content identification information of the website to be confirmed and the second website page content identification information of the target website;

and the determining module is used for determining that the website to be confirmed is a phishing website if the first website page content identification information is the same as the second website page content identification information.

Wherein the determining module comprises:

the first information acquisition submodule is used for acquiring the IP attribution information of the website to be confirmed and the IP attribution information of the target website;

the first determining submodule is used for determining the website to be confirmed as a phishing website if the IP attribution information of the website to be confirmed is inconsistent with the IP attribution information of the target website;

and the second determining submodule is used for acquiring the link ratio information of the website to be confirmed if the IP attribution information of the website to be confirmed is consistent with the IP attribution information of the target website, and determining that the website to be confirmed is a phishing website when the link ratio information is determined to meet the preset condition.

Wherein the second determination submodule includes:

an information acquisition unit that acquires link ratio information of the website to be confirmed, the link ratio information including a ratio at which the website to be confirmed links to the target website;

the first determining unit is used for determining that the website to be confirmed is a phishing website if the ratio of the website to be confirmed to be linked to the target website is greater than or equal to a first preset value.

Wherein the link ratio information further includes an abnormal link ratio of the website to be confirmed; the second determination sub-module further includes:

the first judging unit is used for determining whether the sum of the ratio of the website to be confirmed to be linked to the target website and the abnormal link ratio is greater than or equal to a second preset value or not if the ratio of the website to be confirmed to be linked to the target website is smaller than the first preset value;

and the second determining unit is used for determining that the website to be confirmed is a phishing website if the sum of the ratio of the website to be confirmed to be linked to the target website and the abnormal link ratio is greater than or equal to the second preset value.

Wherein the second determination sub-module further includes:

the content detection unit is used for detecting the webpage content of the website to be confirmed;

the information obtaining unit is specifically configured to obtain link ratio information of the website to be confirmed when it is detected that the website to be confirmed includes preset content.

The technical scheme of the invention has the following beneficial effects:

in the embodiment of the invention, when the website to be detected is determined to be the website to be confirmed according to the domain name information of the website to be detected, the website to be confirmed can be determined to be the phishing website by comparing the website page content identification information of the website to be confirmed with the website page content identification information of the target website. Because the website page content identification information is convenient to acquire, whether the website to be confirmed is a phishing website can be quickly determined by using the scheme of the embodiment of the invention, and the timeliness of system processing is further improved.

Drawings

FIG. 1 is a flowchart illustrating a method for detecting phishing websites according to a first embodiment of the invention;

FIG. 2 is a schematic diagram of a network architecture for phishing website detection according to a second embodiment of the present invention;

FIG. 3 is a block diagram of a phishing website identification detection system in accordance with a second embodiment of the present invention;

FIG. 4 is a flowchart illustrating a method for detecting phishing websites according to a second embodiment of the invention;

FIG. 5 is a diagram illustrating a phishing website determination performed in combination with IP attribution information according to a second embodiment of the present invention;

FIG. 6 is a diagram illustrating the determination of phishing websites with reference to link ratio information according to a second embodiment of the present invention;

fig. 7 is a schematic diagram of a detection device for a phishing website according to a third embodiment of the invention.

Detailed Description

The following detailed description of embodiments of the present invention will be made with reference to the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

At present, the detection methods of common phishing websites mainly include the following methods:

(1) and detecting similar domain names: in order to achieve the effect of deceiving the user, the domain name of the phishing website is generally similar to that of the phishing target website, so the phishing website can be detected in a similar domain name detection mode.

(2) And page content detection: the phishing website page generally enables a user to input sensitive information such as a bank card number, a password, an identification card number, a mobile phone number and the like, so that phishing website detection can be carried out in a page content detection mode.

(3) And page similarity detection: in order to achieve the effect of deceiving the user, the content and layout of the phishing website page are generally similar to those of the phishing target website, so that the phishing website detection can be performed through page similarity detection.

The similar domain name detection technology and the page content detection in the prior art both have the problem of false alarm, and the page similarity detection technology has the problem of low detection efficiency due to the need of comparing the whole page. Although the existing phishing website detection system can integrate several existing detection technologies, the problem of low detection efficiency still exists, and the system processing is not timely.

Example one

As shown in fig. 1, a method for detecting a phishing website according to a first embodiment of the present invention includes:

step 101, acquiring first domain name information of a to-be-detected website and second domain name information of a target website.

The website to be detected may include one or more websites. If a plurality of websites are included, the detection mode of each website can be detected according to the detection method of the embodiment of the invention. The target website refers to a website attacked by the phishing website, namely, a phishing target website.

In the embodiment of the invention, when the user surfs the internet, the access behavior of the user can be recorded no matter the user visits a target website or a phishing website. And extracting the URL (Uniform Resource Locator) of the website to be detected and the URL of the target website by analyzing the acquired user internet traffic or the user internet log.

And step 102, if the first domain name information is similar to the second domain name information, taking the website to be detected as a website to be confirmed.

The website to be confirmed refers to a website to be detected which is possibly considered as a phishing website.

In this step, similar domain name detection is performed according to the acquired first domain name information of the to-be-detected website and the second domain name information of the target website. Specifically, a similar domain name regular expression is constructed according to the domain name of the target website, wherein the similar rule is that a certain letter or some letters in the domain name are replaced by similar letters or numbers, for example, l is replaced by 1 or I, and the like. And matching the domain name of the website to be detected with the expression, and if the result is matching, taking the website to be detected as the website to be confirmed for further judgment. Otherwise, the non-phishing website may be considered.

Step 103, acquiring the first website page content identification information of the website to be confirmed and the second website page content identification information of the target website.

In the embodiment of the present invention, the first website page Content identification information of the website to be confirmed is ICP (Internet Content Provider) number information of the website to be confirmed; and the second website page content identification information of the target website is ICP number information of the target website.

In general, in order to achieve the goal of deceiving visitors, except for the input box part for deceiving users to input sensitive information such as bank card numbers, passwords, identity card numbers, mobile phone numbers and the like, other parts try to imitate the phishing target website, including page layout, color matching, page pictures, page characters and the like. Under normal conditions, an ICP number information is assigned to a website subjected to legal filing for recording the filing information of the website, and usually, the ICP number information is marked at the bottom of a website page by the website for identifying that the website is a legally registered website.

Research shows that ICP (inductively coupled plasma) number information is marked at the bottom of a page of a phishing website in order to simulate the phishing website as much as possible, and the ICP number information is the ICP number information directly copied from the phishing website. Since many phishing websites have such a feature, the detection of the phishing websites can be performed by comparing the ICP number information of the phishing website pages with the ICP number information of the phishing target website pages.

In specific application, a page (such as a home page) of a website to be confirmed can be obtained through a page crawling technology, and then corresponding first ICP number information is obtained. Similarly, the page (such as the home page) of the target website can be obtained through the page crawling technology, and then the corresponding second ICP number information is obtained.

And step 104, if the first website page content identification information is the same as the second website page content identification information, determining that the website to be confirmed is a phishing website.

By contrast, if the first ICP number information and the second ICP number information are the same, here it can be determined that the website to be confirmed is a phishing website.

As can be seen from the above, in the embodiment of the present invention, when the website to be detected is determined to be the website to be confirmed according to the domain name information of the website to be detected, the website to be confirmed can be determined to be the phishing website by comparing the website page content identification information of the website to be confirmed with the website page content identification information of the target website. Because the website page content identification information is convenient to acquire, whether the website to be confirmed is a phishing website can be quickly determined by using the scheme of the embodiment of the invention, and the timeliness of system processing is further improved.

Example two

Fig. 2 is a schematic diagram of a network architecture for phishing website detection. Fig. 2 is a schematic diagram illustrating an example of surfing the internet via a mobile phone. In practical application, the internet can be accessed through equipment such as a computer. When the phishing website is detected, the phishing website detection method mainly relates to a phishing website, a phishing target website, a flow mirroring system or an online log storage system and a phishing website identification detection system.

When a user surfs the internet (for example, surfs the internet through a mobile phone), no matter whether the user visits a normal website or a phishing website, the traffic mirroring system or the internet log retention system can record the internet behavior of the user faithfully through the traffic mirroring or the internet log retention mode. The phishing website identification and detection system acquires the user internet flow or the user internet log from the flow mirror system or the internet log storage system and is used for identifying and detecting the phishing website. The phishing website identification and detection system needs to access the Internet, crawl ICP (Internet protocol) number information in a page of a phishing target website, acquire IP (Internet protocol) and attribution information of the phishing target website and acquire page content of the phishing website to be confirmed.

In the following embodiments, the website to be confirmed is referred to as a suspected phishing website, and the target website is referred to as a phishing target website. The structure of the phishing website identification detection system is shown in fig. 3. The functions and the working principle of each main functional module are as follows:

1. phishing target website information acquisition module

The phishing target website information acquisition module has the functions of: and obtaining information such as IP, IP attribution, page ICP number and the like of the phishing target website. The specific functions and implementation of the module are as follows:

(1) using commands such as nslookup and host or a DNS (Domain Name System) query tool to query and acquire the IP or the IP list of the phishing target website;

(2) acquiring the attribution or attribution list of the phishing target website by inquiring an online or offline IP attribution information base according to the acquired IP or IP list of the phishing target website;

(3) crawling a first page or any other page of a fishing target website, and extracting ICP (Internet protocol) number information marked at the bottom of the page, wherein the ICP number information comprises province information and serial number information;

(4) and storing the IP or IP list information, the attribution or attribution list information and the ICP number information obtained by the inquiry or the acquisition.

2. Flow or log collection and analysis module

The flow or log collection and analysis module has the functions of: the method comprises the steps of collecting the flow of a flow mirror system or logging of an online retention log system, then analyzing the flow or the log, and extracting key information for phishing website identification and detection. The specific functions and implementation of the module are as follows:

(1) receiving the flow of the mirror image of the flow mirror system or actively acquiring the log of the log retention system, wherein the active acquisition mode of the log can be an FTP (File Transfer Protocol) mode or an API (Application Program Interface) Interface mode;

(2) analyzing the flow by using a libpcap library, or analyzing the log according to a log format saved on the internet;

(3) and extracting the access URL, the access target IP and the access target port in the flow or the log for subsequent phishing website identification detection.

3. Similar domain name detection module

The similar domain name detection module has the functions of: and detecting whether the domain name in the URL extracted by the last module is a similar domain name or not according to the domain name of the phishing target website needing protection. The specific functions and implementation of the module are as follows:

(1) constructing a similar domain name regular expression according to the domain name of the phishing target website, wherein the similar rule is that letters in the domain name are replaced by similar letters, and if l is replaced by 1 or I, and the like;

(2) and matching the domain name in the URL extracted by the flow or log acquisition and analysis module based on a similar domain name regular expression, wherein the hit domain name is used as the domain name to be confirmed and is used for further identification and detection, and the domain name which is not hit is not processed and can be regarded as a non-phishing website.

4. Suspected phishing website page crawling module

The suspected phishing website page crawling module has the functions of: and crawling suspected fishing website page content, and providing detection data input for the ICP number comparison detection, the link ratio calculation and the fishing website comprehensive identification and determination module. The specific functions and implementation of the module are as follows:

(1) according to the matched domain name of the similar domain name detection module, page contents are crawled based on an original URL in the flow or log collection and analysis module, for example, the home page contents corresponding to the domain name are crawled;

(2) extracting ICP number information including province information and number information in the crawled suspected phishing website page;

(3) and inquiring the attribution information of the suspected phishing website according to the target IP information extracted from the flow or log acquisition and analysis module.

5. ICP number comparison detection module

The ICP number comparison detection module has the functions of: and comparing the ICP information of the suspected phishing website page with the ICP information of the phishing target website page, and judging whether the suspected phishing website is a phishing website or not by combining the domain name and the IP attribution information. The specific functions and implementation of the module are as follows:

(1) comparing ICP information of the phishing target website and the suspected phishing website, wherein the ICP information comprises province information and number information;

(2) if the ICP information and the domain name are the same, and the IP attribution information is different, directly judging that the website to be detected is a phishing website;

(3) if the ICP information and the domain name are the same, and the IP attribution information is the same, the phishing website comprehensive identification judgment module is required to make further judgment;

(4) if the ICP information is different from the ICP information, the website can be directly judged as a non-phishing website.

6. Link ratio calculation module

The function of the link ratio calculation module is: according to the suspected phishing website pages crawled by the suspected phishing website page crawling module, the link ratio of the suspected phishing website home page to the phishing target website, the ratio of abnormal links (abnormal links) and the sum of the link ratio of the suspected phishing website to the phishing target website and the ratio of the abnormal links are calculated.

Here, the abnormal link refers to a link that is empty or a link that is the current path "/" that does not normally occur in a normal web page. The abnormal link specifically includes the following cases:

the specific functions and implementation of the module are as follows:

(1) acquiring all hyperlinks in a home page of a suspected phishing website;

(2) calculating the link ratio of the links to the phishing target website for all the links;

(3) calculating the ratio of abnormal links for all links;

(4) the sum of the ratio of links to the phishing target website and the ratio of abnormal links is calculated.

The specific process is as follows: suppose that the number of all hyperlinks in the home page of the suspected phishing website is z, the link to the phishing target website is x, and the abnormal link is y. Let the ratio of links to the phishing target website be p, the ratio of normal links be q, and the sum of the ratio of links to the phishing target website and the ratio of abnormal links be r.

Wherein, the calculation formula of the link ratio p to the phishing target website is as follows:

the calculation formula of the rate q of the abnormal link is as follows:

the calculation formula of the sum r of the link ratio to the phishing target website and the ratio to the abnormal links is as follows:

7. comprehensive identification and judgment module for phishing website

The phishing website comprehensive identification and judgment module has the functions of: and further judging suspected phishing websites which cannot be judged by the ICP number comparison detection module. The specific functions and implementation of the module are as follows:

(1) detecting the content of the webpage of the suspected phishing website which cannot be judged by the ICP number comparison detection module, and detecting whether sensitive information such as a bank card number, a password, an identity card number or a mobile phone number needs to be input in the webpage or not;

(2) if the webpage content detection does not find that sensitive information such as a bank card number, a password, an identification card number or a mobile phone number needs to be input, judging that the suspected phishing website is a non-phishing website;

(3) if the webpage content detection finds that sensitive information such as a bank card number, a password, an identification card number or a mobile phone number needs to be input, further judging the link ratio;

(4) if the ratio p of the link rates to the phishing target website is greater than or equal to a specified threshold value delta (such as 70%), namely p is greater than or equal to delta, the suspected phishing website is judged to be a phishing website.

(5) If the ratio p of link links to the phishing target website among the link ratios is less than a designated threshold value delta, i.e., p < delta, it is compared whether the sum r of the ratio of link links to the phishing target website and the ratio of abnormal links is greater than or equal to a designated threshold value theta (e.g., 80%).

And if the sum r of the link ratio of the links to the phishing target website and the ratio of the abnormal links is larger than or equal to a specified threshold value theta, namely r is larger than or equal to theta, judging the suspected phishing website as the phishing website.

(6) If the ratio p of the link rates to the phishing target website is less than a specified threshold value delta and the sum r of the ratio of the link rates to the phishing target website and the ratio of the abnormal links is less than a specified threshold value theta, namely p < delta and r < theta, the suspected phishing website is judged to be a non-phishing website.

As shown in fig. 4, the method for detecting a phishing website according to the second embodiment of the present invention includes:

step 401, obtaining first domain name information of a website to be detected and second domain name information of a phishing target website.

Step 402, if the first domain name information is similar to the second domain name information, the website to be detected is taken as a suspected phishing website.

In the processes of step 401 and step 402, the flow or log collection and analysis module, the phishing target website information acquisition module and the similar domain name detection module may be referred to in the working process.

And step 403, acquiring the first website page content identification information of the suspected phishing website and the second website page content identification information of the phishing target website.

The specific process can refer to the working processes of the suspected phishing website page crawling module and the phishing target website information acquisition module.

Step 404, if the first website page content identification information is the same as the second website page content identification information, determining that the suspected phishing website is a phishing website, otherwise, determining that the suspected phishing website is a non-phishing website.

Specifically, in this step, in order to improve the detection accuracy, as shown in fig. 5, the following process is included:

and step 501, acquiring the IP attribution information of the suspected phishing website and the IP attribution information of the phishing target website.

Step 502, if the IP attribution information of the suspected phishing website is inconsistent with the IP attribution information of the phishing target website, determining that the suspected phishing website is a phishing website.

Step 503, if the IP attribution information of the suspected phishing website is consistent with the IP attribution information of the phishing target website, acquiring link ratio information of the suspected phishing website, and determining that the suspected phishing website is a phishing website when the link ratio information is determined to meet a preset condition.

Specifically, this step is shown in fig. 6:

step 601, detecting the webpage content of the suspected phishing website.

Step 602, when it is detected that the suspected phishing website includes preset content, obtaining link ratio information of the suspected phishing website. And when the suspected phishing website is detected not to include the preset content, determining that the suspected phishing website is a non-phishing website.

Wherein the connection ratio information includes: the ratio of the suspected phishing websites to the phishing target websites and the abnormal link ratio of the suspected phishing websites. The preset content can be sensitive information such as a bank card number, a password, an identification card number or a mobile phone number which needs to be input.

Step 603, if the ratio of the suspected phishing website to the phishing target website is greater than or equal to a first preset value, determining that the suspected phishing website is a phishing website.

Step 604, if the ratio of the suspected phishing website to the phishing target website is smaller than the first preset value, determining whether the sum of the ratio of the suspected phishing website to the phishing target website and the abnormal link ratio is larger than or equal to a second preset value.

Step 605, if the sum of the ratio of the suspected phishing website linked to the phishing target website and the abnormal link ratio is greater than or equal to the second preset value, determining that the suspected phishing website is a phishing website.

The first preset value and the second preset value can be set arbitrarily.

After confirming as the phishing website, the information of the phishing website can be stored in a phishing website library.

Therefore, in the embodiment of the invention, the detection efficiency of the phishing website can be further improved and the detection accuracy of the phishing website can be improved by combining the similar domain name detection technology, the IP attribution and ICP detection technology and the like. Meanwhile, ICP filing information does not need to be stored and inquired in the scheme, so that the storage amount and the calculated amount of the system are saved, the complexity of system design is simplified, and the timeliness and the reliability of system processing are improved.

EXAMPLE III

As shown in fig. 7, the third embodiment of the present invention provides a phishing website detection apparatus, including:

a first information obtaining module 701, configured to obtain first domain name information of a to-be-detected website and second domain name information of a target website; a domain name detection module 702, configured to use the to-be-detected website as a website to be confirmed if the first domain name information is similar to the second domain name information; a second information obtaining module 703, configured to obtain the first website page content identification information of the website to be confirmed and the second website page content identification information of the target website; a determining module 704, configured to determine that the website to be confirmed is a phishing website if the first website page content identification information is the same as the second website page content identification information.

Wherein the determining module 704 comprises: the first information acquisition submodule is used for acquiring the IP attribution information of the website to be confirmed and the IP attribution information of the target website; the first determining submodule is used for determining the website to be confirmed as a phishing website if the IP attribution information of the website to be confirmed is inconsistent with the IP attribution information of the target website; and the second determining submodule is used for acquiring the link ratio information of the website to be confirmed if the IP attribution information of the website to be confirmed is consistent with the IP attribution information of the target website, and determining that the website to be confirmed is a phishing website when the link ratio information is determined to meet the preset condition.

Specifically, the second determining submodule includes: an information acquisition unit that acquires link ratio information of the website to be confirmed, the link ratio information including a ratio at which the website to be confirmed links to the target website; the first determining unit is used for determining that the website to be confirmed is a phishing website if the ratio of the website to be confirmed to be linked to the target website is greater than or equal to a first preset value.

In addition, the link ratio information also comprises an abnormal link ratio of the website to be confirmed; the second determination sub-module further includes: the first judging unit is used for determining whether the sum of the ratio of the website to be confirmed to be linked to the target website and the abnormal link ratio is greater than or equal to a second preset value or not if the ratio of the website to be confirmed to be linked to the target website is smaller than the first preset value; and the second determining unit is used for determining that the website to be confirmed is a phishing website if the sum of the ratio of the website to be confirmed to be linked to the target website and the abnormal link ratio is greater than or equal to the second preset value.

To further improve the accuracy, the second determining sub-module further includes: the content detection unit is used for detecting the webpage content of the website to be confirmed; the information obtaining unit is specifically configured to obtain link ratio information of the website to be confirmed when it is detected that the website to be confirmed includes preset content.

In the embodiment of the invention, the ratio of the website to be confirmed to be linked to the target website is the quotient of the number of links of the website to be confirmed to the target website and the total number of hyperlinks of the website to be confirmed; the abnormal link ratio is the quotient of the abnormal link times of the website to be confirmed and the total number of hyperlinks of the website to be confirmed.

The working principle of the device according to the invention can be referred to the description of the method embodiment described above.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform some steps of the transceiving method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A phishing website detection method is characterized by comprising the following steps:

acquiring first website page content identification information of the website to be confirmed and second website page content identification information of a target website; the first website page content identification information of the website to be confirmed is Internet Content Provider (ICP) number information of the website to be confirmed; the second website page content identification information of the target website is ICP number information of the target website;

if the first website page content identification information is the same as the second website page content identification information, judging whether the website to be confirmed is a phishing website:

2. The method according to claim 1, wherein the step of acquiring the link ratio information of the website to be confirmed if the IP attribution information of the website to be confirmed is consistent with the IP attribution information of the target website, and determining that the website to be confirmed is a phishing website when it is determined that the link ratio information satisfies a preset condition, comprises:

3. The method according to claim 2, wherein the link ratio information further includes an abnormal link ratio of the website to be confirmed;

4. The method according to claim 2 or 3, further comprising, before the step of obtaining the link ratio information of the website to be confirmed:

detecting the webpage content of the website to be confirmed;

5. The method of claim 3,

the ratio of the website to be confirmed to be linked to the target website is the quotient of the number of times of the website to be confirmed to be linked to the target website and the total number of hyperlinks of the website to be confirmed;

6. A phishing website detection apparatus, comprising:

the second information acquisition module is used for acquiring the first website page content identification information of the website to be confirmed and the second website page content identification information of the target website; the first website page content identification information of the website to be confirmed is Internet Content Provider (ICP) number information of the website to be confirmed; the second website page content identification information of the target website is ICP number information of the target website;

the determining module is used for judging whether the website to be confirmed is a phishing website or not if the first website page content identification information is the same as the second website page content identification information;

the determining module comprises:

7. The apparatus of claim 6, wherein the second determination submodule comprises:

8. The apparatus according to claim 7, wherein the link ratio information further includes an abnormal link ratio of the website to be confirmed; the second determination sub-module further includes:

9. The apparatus of claim 7 or claim 8, wherein the second determination submodule further comprises:

10. The apparatus of claim 8,