CN115603924A

CN115603924A - Detection method and device for phishing mails, electronic equipment and storage medium

Info

Publication number: CN115603924A
Application number: CN202110723018.4A
Authority: CN
Inventors: 宁阳; 闫凡; 郜振锋; 郑景中; 王雄; 许云中
Original assignee: Sangfor Technologies Co Ltd
Current assignee: Sangfor Technologies Co Ltd
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2023-01-13

Abstract

The application discloses a detection method of phishing mails, which comprises the following steps: acquiring at least one Uniform Resource Locator (URL) link in a target mail, and determining at least one primary domain name in the URL link; deleting a known domain name in the at least one first-level domain name to obtain a suspicious URL set; wherein the known domain name is an intersection of the at least one primary domain name and a set of security domain names; and if the suspicious URL set is not empty, outputting a phishing mail detection result according to the similarity between the suspicious URL set and the safe URL link. The method and the device can improve the accuracy of detecting the phishing mails. The application also discloses a detection device for the fishing mails, an electronic device and a storage medium, and the detection device has the beneficial effects.

Description

Detection method and device for phishing mails, electronic equipment and storage medium

Technical Field

The present application relates to the field of network security technologies, and in particular, to a method and an apparatus for detecting phishing mails, an electronic device, and a storage medium.

Background

Phishing typically tricks the recipient into replying to the intended recipient with an account number, password, etc., or directs the recipient to connect to a tailored web page. The webpage pointed by the phishing mail is usually disguised as a real website, such as a bank or financial webpage, so that a login user can believe the true and input a credit card or bank card number, an account name, a password and the like to be stolen. Therefore, detection of phishing mail is a major concern for network security personnel.

In the related art, the fishing mails are mainly identified based on the mailbox name and the sender IP address, but the above scheme has low accuracy in detecting the fishing mails because the mailbox name and the sender IP address are easily disguised and changed.

Therefore, how to improve the accuracy of detecting phishing mails is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

The application aims to provide a method and a device for detecting phishing mails, electronic equipment and a storage medium, which can improve the accuracy rate of detecting the phishing mails.

In order to solve the above technical problem, the present application provides a method for detecting phishing mails, including:

acquiring at least one Uniform Resource Locator (URL) link in a target mail, and determining at least one primary domain name in the URL link;

deleting a known domain name in the at least one first-level domain name to obtain a suspicious URL set; wherein the known domain name is an intersection of the at least one primary domain name and a set of security domain names;

and if the suspicious URL set is not empty, outputting a phishing mail detection result according to the similarity between the suspicious URL set and the safe URL link.

Optionally, outputting a phishing mail detection result according to the similarity between the suspicious URL set and the safe URL link includes:

and if the similarity between the suspicious URL set and the safe URL link is within a first similarity interval, outputting a detection result that the target mail is a phishing mail.

Optionally, the method further includes:

if the similarity between the suspicious URL set and the safe URL link is not in the first similarity interval, performing character replacement on the suspicious URL set to obtain a new suspicious URL set, and judging that the similarity between the new suspicious URL set and the safe URL link is in a second similarity interval.

Optionally, the method further includes:

and if the similarity between the new suspicious URL set and the safe URL link is within the second similarity interval, outputting the detection result that the target mail is the phishing mail.

Optionally, performing character replacement on the suspicious URL set to obtain a new suspicious URL set, including:

and carrying out isomorphic character replacement and/or punycode code replacement on the suspicious URL set to obtain the new suspicious URL set.

Optionally, the method further includes:

if the similarity between the new suspicious URL set and the safe URL link is not within the second similarity interval, extracting the core key words of each safe domain name in the safe domain name set;

judging whether the suspicious URL set comprises the core keyword or not;

and if so, judging that the target mail is a phishing mail.

Optionally, the extracting the core keyword of each security domain name in the security domain name set includes:

and extracting a difference set of the first-level domain name and the top-level domain name of each safe domain name in the safe domain name set as the core keyword.

Optionally, after determining at least one primary domain name in at least one of the URL links, the method further includes:

and removing repeated segments in the primary domain name.

Optionally, the method further includes:

and if the target mail is a phishing mail, adding the first-level domain name in the URL link to a URL blacklist, and marking the camouflage type of the URL link.

The application also provides a detection device for fishing mails, which comprises:

the domain name determining module is used for acquiring at least one Uniform Resource Locator (URL) link in a target mail and determining at least one primary domain name in the URL link;

the suspicious URL determining module is used for deleting a known domain name in the at least one first-level domain name to obtain a suspicious URL set; wherein the known domain name is an intersection of the at least one primary domain name and a set of security domain names;

and the judging module is used for outputting a phishing mail detection result according to the similarity between the suspicious URL set and the safe URL link if the suspicious URL set is not empty.

The application also provides a storage medium, wherein a computer program is stored on the storage medium, and the computer program realizes the steps executed by the detection method of the phishing mails when executed.

The application also provides electronic equipment which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps executed by the detection method of the phishing mails when calling the computer program in the memory.

The application provides a phishing mail detection method, which comprises the following steps: acquiring at least one Uniform Resource Locator (URL) link in a target mail, and determining at least one primary domain name in the URL link; deleting a known domain name in the at least one first-level domain name to obtain a suspicious URL set; wherein the known domain name is an intersection of the at least one primary domain name and a set of security domain names; and if the suspicious URL set is not empty, outputting a phishing mail detection result according to the similarity between the suspicious URL set and the safe URL link.

The method and the device obtain at least one URL link in the target mail, and compare at least one primary domain name of the at least one URL link with a known domain name. Because the page jump of the phishing mail is mainly realized through the URL link, if the primary domain name belongs to the safe domain name set, the address corresponding to the URL link is not the tampered webpage. If the suspicious URL set is not empty, the URL link is indicated to contain other domain names except the safe domain name set, and the similarity between the suspicious URL set and the safe URL link can be judged by continuously utilizing the safe URL link. According to the method and the device, the detection of the phishing mails is realized based on the URL link content of the target mail, the influence of the mailbox name and the sender IP address change is avoided, the disguised phishing mails can be effectively identified, and the accuracy of detecting the phishing mails is improved. This application still provides a detection device, an electronic equipment and a storage medium of fishing mail simultaneously, has above-mentioned beneficial effect, no longer gives unnecessary details here.

Drawings

In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

Fig. 1 is a flowchart of a phishing mail detection method provided in an embodiment of the present application;

FIG. 2 is a flowchart of a phishing mail detection method based on URL link comparison according to an embodiment of the present application;

fig. 3 is a flowchart of a method for detecting phishing mails based on keywords according to an embodiment of the present application;

FIG. 4 is a flowchart of a phishing mail detection method based on identifying confusing URLs according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a detection device for fishing mails according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart of a phishing mail detection method according to an embodiment of the present application.

The specific steps may include:

s101: acquiring at least one URL (Uniform Resource Locator) link in a target mail, and determining at least one first-level domain name in the at least one URL link;

the method and the device can be applied to electronic equipment such as a firewall, an equal security all-in-one machine and a mailbox server, and the target mail can be an unknown type mail sent by other terminals. Before acquiring the URL link in the target email, determining whether the target email includes the URL link may exist, if so, executing the relevant step of S101, and if not, determining that the target email is a normal email.

The target email may include any number of URL links, and after acquiring at least one URL link in the target email, this embodiment may acquire at least one first-level domain name in the at least one URL link. Further, the embodiment can acquire all URL links in the target mail and determine all primary domain names in each URL link so as to improve the detection accuracy of the phishing mail. All the domain names in the URL link are separated by point numbers, counting from the right to the left, all the characters on the right of the first point number are top-level domain names, all the characters on the right of the second point number are first-level domain names, and so on, and the next-level domain name comprises all the contents of the previous-level domain name. For example, the top-level domain name is.com, the first-level domain name is.def.com, and the second-level domain name is.abc.def.com in www.abc.def.com.

Further, after determining the primary domain name in the URL link, the repeated segments in the primary domain name may also be removed. Specifically, the present embodiment may use the content in the URL link, where the number of repeated bytes is greater than the preset value, as the repeated segment. By deleting the repeated fragments, the calculation amount of the phishing mail detection process can be reduced, and the detection efficiency of the phishing mails is improved.

S102: deleting a known domain name in at least one first-level domain name to obtain a suspicious URL set;

after the first-level domain name linked to the URL is determined, the first-level domain name may be matched with a security domain name in the security domain name set, and then an intersection of at least one first-level domain name and the security domain name set is used as a known domain name. In this step, known domain names in all the first-level domain names linked by all the URLs can be deleted to obtain a suspicious URL set. The security domain name is a known secure domain name.

S103: if the suspicious URL set is empty, judging that the target mail is a normal mail;

if the suspicious URL set is empty, the first-level domain name linked with the URL is a safe domain name, safe access can be achieved, and the target mail can be judged to be a normal mail.

S104: and if the suspicious URL set is not empty, outputting a phishing mail detection result according to the similarity between the suspicious URL set and the safe URL link.

If the suspicious URL set is not empty, the safety of part of contents of the first-level domain name of the URL link is unknown, similarity comparison can be carried out on the contents in the suspicious URL set and the safe URL link, and then a phishing mail detection result is output according to the similarity comparison result.

Specifically, if the similarity between the suspicious URL set and the safe URL link is high, it indicates that the content of the first-level domain name other than the known domain name is safe, and it may be determined that the target email is a normal email. If the similarity between the suspicious URL set and the safe URL link is low, the security of the content except the known domain name in the first-level domain name is unknown, at this time, the target mail can be judged to be a phishing mail, and the suspicious URL set can be further detected by adopting a character replacement and keyword matching method. The secure URL is a known secure URL.

As a possible implementation mode, after the target mail is detected to be the phishing mail, the first-level domain name in the URL link can be added to the URL blacklist, and the camouflage type of the URL link is marked. Specifically, the above-mentioned primary domain name added to the URL blacklist is a primary domain name that makes the target mail determined as a phishing mail. The camouflage types may include: domain name similarity camouflage, domain name character replacement camouflage and domain name keyword camouflage. When receiving the mail, the mail containing the URL link can be screened by utilizing the URL blacklist, and the network security is improved.

In this embodiment, at least one URL link in the target email is obtained, and at least one primary domain name of the at least one URL link is compared with a known domain name. Because the page jump of the phishing mail is mainly realized through the URL link, if the primary domain name belongs to the safe domain name set, the address corresponding to the URL link is not the tampered webpage. If the suspicious URL set is not empty, the URL link is indicated to contain other domain names except the safe domain name set, and the similarity between the suspicious URL set and the safe URL link can be judged by continuously utilizing the safe URL link. The detection of the phishing mails is realized based on the URL link content of the target mail, the influence of mailbox name and sender IP address change is avoided, the disguised phishing mails can be effectively identified, and the accuracy rate of detecting the phishing mails is improved.

Referring to fig. 2, fig. 2 is a flowchart of a method for detecting phishing mails based on URL link comparison according to an embodiment of the present application, where this embodiment is a further description of the method for detecting phishing mails when a suspicious URL set is not empty in the embodiment corresponding to fig. 1, and a further implementation may be obtained by combining this embodiment with the embodiment corresponding to fig. 1, where this embodiment may include the following steps:

s201: and calculating the similarity between the suspicious URL set and the safe URL link.

S202: judging whether the similarity between the suspicious URL set and the safe URL link is within a first similarity interval or not; if yes, entering S206; if not, the process proceeds to S203.

S203: and carrying out character replacement on the suspicious URL set to obtain a new suspicious URL set.

S204: judging whether the similarity between the new suspicious URL set and the safe URL link is within a second similarity interval or not; if yes, entering S206; if not, the process proceeds to S205.

S205: and judging that the target mail is a normal mail.

S206: and judging the target mail as the phishing mail.

It will be appreciated that hackers often pretend by adding an illegal domain name close to the secure domain name to the URL link in order to avoid easy recognition by the user when creating phishing mails. Therefore, in the above embodiment, similarity comparison is performed between the safe URL link and the suspicious URL set, and if the similarity between the suspicious URL set and the safe URL link is within the first similarity interval, the detection result that the target email is a phishing email is output. And if the similarity between the suspicious URL set and the safe URL link is not within a first similarity interval, performing character replacement on the suspicious URL set so as to prevent hackers from forging phishing mails in a character replacement mode. Specifically, in this embodiment, the new suspicious URL set may be obtained by performing character replacement on the suspicious URL set by using a punycode code (domain name code). In this embodiment, the new suspicious URL set may be obtained by performing character replacement on the suspicious URL set using homomorphic characters. And if the similarity between the new suspicious URL set and the safe URL link is within a second similarity interval, outputting the detection result that the target mail is the phishing mail. The first similarity interval may be 70% to 95%, and the second similarity interval may be 60% to 85%.

Referring to fig. 3, fig. 3 is a flowchart of a method for detecting phishing mails based on keywords according to an embodiment of the present application, which is a further description of the method for identifying phishing mails in the embodiment corresponding to fig. 2, and a further implementation manner can be obtained by combining the embodiment with the embodiment corresponding to fig. 2, where the embodiment may include the following steps:

s301: if the similarity between the new suspicious URL set and the safe URL link is not within the second similarity interval, extracting the core key words of each safe domain name in the safe domain name set;

after the suspicious URL set is obtained, the present embodiment may extract a difference set between a first-level domain name and a top-level domain name of each security domain name in the security domain name set as the core keyword. To illustrate the above process, for the security domain name www.abc.com, the difference between the first level domain name, abc.com, and the top level domain name, com, is.

S302: judging whether the suspicious URL set comprises the core keyword or not; if yes, entering S303; if not, entering S304;

s303: judging the target mail as a phishing mail;

s304: and judging that the target mail is a normal mail.

If the similarity between the suspicious URL set and the safe URL link is low and the similarity between the suspicious URL set after character replacement and the safe URL link is low, a situation that a hacker forges the phishing mail by adding the core keyword may exist. Therefore, whether the suspicious URL set comprises the core keywords or not is further judged, and the coverage rate of phishing mail detection is improved.

In the related art, social engineering, UEBA technology, text binary classification models and other modes are generally adopted for phishing mail detection, but the modes cannot well detect the phishing mails which are elaborately constructed by hackers and confuse URLs, and the phishing mail technology which is specially used for identifying the confusing URLs does not exist at present.

The following describes the flow described in the above embodiment by an embodiment in practical application, please refer to fig. 4, fig. 4 is a flowchart of a phishing mail detection method based on identifying a confusing URL provided in the embodiment of the present application, and the embodiment provides a detection method of a phishing mail identifying a confusing URL to fill the current blank. In the embodiment, firstly, a URL existing in a mail is detected, a first-level domain name is extracted from the URL, whether the first-level domain name exists in a locally loaded well-known domain name set (namely a safety domain name set) is judged, if not, the first-level domain name and the locally stored well-known domain name (namely a safety domain name) are used for carrying out similarity calculation one by one, and if the similarity is high, the mail is a high-risk phishing mail; if the similarity is low, the core keywords (the difference part of the first-level domain name and the top-level domain name) are continuously loaded, whether the core keywords exist in the URL link of the mail or not is judged, and if yes, the high-risk phishing mail can be judged. The present embodiment may include the following steps:

step 1, loading mail log data of a user, and detecting whether a URL link exists in a mail text; if yes, entering step 2; if not, the mail is directly judged to be a normal mail.

Step 2, removing duplication of the URL link and obtaining a corresponding primary domain name, and if the primary domain name has intersection with the known domain name set, deleting the content left by the intersection as a suspicious URL set; and if the suspicious URL set is empty, directly judging that the mail is a normal mail.

Step 3, traversing the suspicious URL sets one by one, and calculating the similarity of the suspicious URL sets and the known URL links; and if the similarity is within the upper and lower threshold value ranges, determining the phishing mails with high risk. If the detection result is not within the upper and lower threshold ranges, the next detection is continued.

And 4, step 4: similar character replacement based on the punycode code is carried out on the suspicious URL sets one by one, then similarity calculation is carried out on the suspicious URL sets after character replacement and well-known URLs, and if the similarity is within the range of upper and lower thresholds, the high-risk phishing mails are judged. And if the high-risk mails are not obtained, continuing the next detection.

And 5: and (3) locally loading core keywords of the well-known domain name, extracting a difference set of the first-level domain name and the top-level domain name, judging whether the keywords exist in a suspicious URL set, judging that the phishing mail is detected if the keywords exist, and otherwise, judging that the mail is a normal mail.

The existing phishing mail detection method does not use page content of URL in the mail and page content of famous URL to carry out similarity matching calculation, also does not use Punycode and homomorphic characters to replace, and does not use a difference set of a first-level domain name and a top-level domain name to detect confusing URL. However, many phishing mails exist at present, hackers use a technology of imitating a well-known URL, including punycode code and isomorphic character replacement, and a technology of placing keywords of a well-known domain name in other positions of the URL to cheat user trust, so that user account information is leaked. By identifying the URL confusion technology, the attack of the phishing mails can be prevented, and the basic benefit of the user is protected from being lost.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a detection device for phishing mails according to an embodiment of the present application;

the apparatus may include:

a domain name determining module 501, configured to obtain at least one uniform resource locator URL link in a target email, and determine at least one primary domain name in the at least one URL link;

a suspicious URL determining module 502, configured to delete a known domain name in the at least one first-level domain name to obtain a suspicious URL set; wherein the known domain name is an intersection of the at least one primary domain name and a set of security domain names;

and the judging module 503 is configured to output a phishing mail detection result according to the similarity between the suspicious URL set and the secure URL link if the suspicious URL set is not empty.

In this embodiment, at least one URL link in the target email is obtained, and at least one primary domain name of the at least one URL link is compared with a known domain name. Because the page jump of the phishing mail is mainly realized through the URL link, if the primary domain name belongs to the safe domain name set, the address corresponding to the URL link is not the tampered webpage. If the suspicious URL set is not empty, the URL link is indicated to contain other domain names except the safe domain name set, and the similarity between the suspicious URL set and the safe URL link can be judged by continuously utilizing the safe URL link. The detection of the phishing mails is realized based on the URL link content of the target mail, the influence of mailbox name and sender IP address change is avoided, the disguised phishing mails can be effectively identified, and the accuracy of detecting the phishing mails is improved.

Further, the determining module 503 is configured to output a detection result that the target email is a phishing email if the similarity between the suspicious URL set and the safe URL link is within a first similarity interval.

Further, the method also comprises the following steps:

and the character replacement module is used for carrying out character replacement on the suspicious URL set to obtain a new suspicious URL set and judging that the similarity between the new suspicious URL set and the safe URL link is in a second similarity interval if the similarity between the suspicious URL set and the safe URL link is not in a first similarity interval.

Further, the method also comprises the following steps:

and the new set detection module is used for outputting the detection result that the target mail is the phishing mail if the similarity between the new suspicious URL set and the safe URL link is within a second similarity interval.

Further, the determining module 503 includes:

and the character replacing unit is used for carrying out isomorphic character replacement and/or punycode code replacement on the suspicious URL set to obtain the new suspicious URL set.

Further, the method also comprises the following steps:

the keyword extraction module is used for extracting the core keyword of each safety domain name in the safety domain name set if the similarity of the new suspicious URL set and the safety URL link is not in the second similarity interval;

a keyword detection module, configured to determine whether the suspicious URL set includes the core keyword; and if so, judging that the target mail is a fishing mail.

Further, the keyword extraction module is configured to extract a difference set of a first-level domain name and a top-level domain name of each security domain name in the security domain name set as the core keyword.

Further, the method also comprises the following steps:

and the duplication removing module is used for removing the repeated segments in the primary domain name after determining at least one primary domain name in at least one URL link.

Further, the method also comprises the following steps:

and the blacklist maintenance module is used for adding the first-level domain name in the URL link to a URL blacklist and marking the camouflage type of the URL link if the target mail is a phishing mail.

Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.

The present application also provides a storage medium having a computer program stored thereon, which when executed, may implement the steps provided by the above-described embodiments. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The present application further provides an electronic device, which may include a memory and a processor, where the memory stores a computer program, and when the processor calls the computer program in the memory, the steps provided in the foregoing embodiments may be implemented. Of course, the electronic device may also include various network interfaces, power supplies, and the like.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

Claims

1. A method for detecting phishing mails, comprising:

2. A phishing mail detection method as claimed in claim 1 wherein outputting phishing mail detection results based on similarity of said suspect URL set and said secure URL link comprises:

3. A method for detecting phishing mails according to claim 2, further comprising:

4. A phishing mail detection method according to claim 2 or 3, further comprising:

5. A phishing mail detection method according to claim 3 wherein character replacement of the set of suspect URLs to obtain a new set of suspect URLs comprises:

6. A method for detecting phishing mails according to claim 3, further comprising:

judging whether the suspicious URL set comprises the core keyword or not;

and if so, judging that the target mail is a phishing mail.

7. The method of detecting phishing mails according to claim 6, wherein extracting the core keyword of each of the set of safe domain names comprises:

8. A phishing mail detection method as claimed in claim 1 further comprising after determining at least one primary domain name in at least one of said URL links:

and removing repeated segments in the primary domain name.

9. A phishing mail detection method as claimed in any one of claims 1 to 8 further comprising:

10. A phishing mail detection apparatus comprising:

11. An electronic device comprising a memory in which a computer program is stored and a processor which, when called upon by the computer program in the memory, carries out the steps of the method for detecting phishing mails according to any one of claims 1 to 9.

12. A storage medium having stored thereon computer-executable instructions which, when loaded and executed by a processor, carry out the steps of a method of detecting phishing mails according to any one of claims 1 to 9.