CN111654472B - Domain name detection method and device - Google Patents

Domain name detection method and device Download PDF

Info

Publication number
CN111654472B
CN111654472B CN202010408567.8A CN202010408567A CN111654472B CN 111654472 B CN111654472 B CN 111654472B CN 202010408567 A CN202010408567 A CN 202010408567A CN 111654472 B CN111654472 B CN 111654472B
Authority
CN
China
Prior art keywords
detected
domain name
idn
feature sequence
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010408567.8A
Other languages
Chinese (zh)
Other versions
CN111654472A (en
Inventor
朱仕阳
李春江
高福海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Asiainfo Security Technology Co ltd
Asiainfo Technologies (chengdu) Inc
Original Assignee
Asiainfo Technologies (chengdu) Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Asiainfo Technologies (chengdu) Inc filed Critical Asiainfo Technologies (chengdu) Inc
Priority to CN202010408567.8A priority Critical patent/CN111654472B/en
Publication of CN111654472A publication Critical patent/CN111654472A/en
Application granted granted Critical
Publication of CN111654472B publication Critical patent/CN111654472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1466Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing

Abstract

The invention discloses a domain name detection method and device, relates to the technical field of computers, and is used for detecting whether international domain name IDN is illegal. The method comprises the following steps: acquiring a first characteristic sequence; the first characteristic sequence is used for uniquely identifying the international domain name IDN to be detected; determining the similarity between the first characteristic sequence and each characteristic sequence in the stored characteristic sequence set; each characteristic sequence in the characteristic sequence set is used for uniquely identifying a legal domain name; if the target similarity is larger than a first preset threshold, determining that the IDN to be detected is illegal; and the target similarity is the highest similarity in the determined similarities. The embodiment of the invention is applied to detecting the illegal domain name.

Description

Domain name detection method and device
Technical Field
The invention relates to the technical field of computers, in particular to a domain name detection method and device.
Background
International Domain Names (IDNs) refer to internet domain names composed entirely or partially of special words or letters (e.g., chinese, arabic, srafur, etc.). IDN and domain names composed of english (for convenience of description, Domain Names (DNs) are used in the following) may be converted into each other. For example: in fig. 1, domain name one is DN, domain name two is IDN, and domain name one and domain name two can be converted to each other.
In practical applications, different languages may have similar language features, such as cyrillic letters similar to latin letters. Many hackers exploit this property to disguise illegal IDN as a legitimate domain name. For example, fig. 1 shows domain two as an illegal IDN and domain three as a legal domain, which would normally be disguised by a hacker as domain three). Thus, in an unknown scene, a user may trigger the device to communicate according to an Internet Protocol (IP) address corresponding to an illegal IDN, which may cause information leakage in the device and potential safety hazards.
After receiving a DN, an existing Domain Name System (DNS) server determines whether the received DN is in a blacklist (the blacklist includes found illegal DNS), and determines whether the received DN is legal according to a determination result. However, if a hacker performs a variety process on an illegal IDN, a new illegal IDN is generated, such that the DN corresponding to the new illegal IDN does not exist in the blacklist until the new illegal IDN is determined to be an illegal domain name. This results in the DNS server not being able to determine whether the received DN is an illegal domain name even through blacklist detection after receiving the DN, and thus, the security of data is still compromised. Therefore, how to detect an illegal IDN is a technical problem that needs to be solved urgently.
Disclosure of Invention
The invention provides a domain name detection method and device, which are used for detecting an illegal domain name.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, a domain name detection method is provided, and the method includes: acquiring a first characteristic sequence; the first characteristic sequence is used for uniquely identifying the international domain name IDN to be detected; determining the similarity between the first characteristic sequence and each characteristic sequence in the stored characteristic sequence set; each characteristic sequence in the characteristic sequence set is used for uniquely identifying a legal domain name; if the target similarity is larger than a first preset threshold, determining that the IDN to be detected is illegal; and the target similarity is the highest similarity in the determined similarities.
In a second aspect, a domain name detection apparatus is provided, which includes an acquisition unit and a determination unit; an acquisition unit configured to acquire a first feature sequence; the first characteristic sequence is used for uniquely identifying the international domain name IDN to be detected; the determining unit is used for determining the similarity between the first characteristic sequence and each characteristic sequence in the stored characteristic sequence set according to the first characteristic sequence acquired by the acquiring unit; each characteristic sequence in the characteristic sequence set is used for uniquely identifying a legal domain name; the determining unit is further used for determining that the IDN to be detected is illegal if the target similarity is greater than a first preset threshold; and the target similarity is the highest similarity in the determined similarities.
In a third aspect, a computer readable storage medium storing one or more programs is provided, wherein the one or more programs include instructions which, when executed by a computer, cause the computer to perform the domain name detection method as in the first aspect.
In a fourth aspect, a domain name detection apparatus is provided, which includes: a processor, a memory, and a communication interface; the communication interface is used for communicating the domain name detection device with other equipment or a network; the memory is used for storing one or more programs, the one or more programs comprising computer executable instructions, and when the domain name detection apparatus is running, the processor executes the computer executable instructions stored in the memory to make the domain name detection apparatus execute the domain name detection method according to the first aspect.
In a fifth aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the domain name detection method of the first aspect.
In a sixth aspect, a domain name detection system is provided, which comprises a network device and the domain name detection apparatus according to the first aspect.
The domain name detection method provided by the embodiment of the invention is applied to detecting illegal domain names. The method comprises the following steps: acquiring a first characteristic sequence; the first characteristic sequence is used for uniquely identifying the international domain name IDN to be detected; determining the similarity between the first characteristic sequence and each characteristic sequence in the stored characteristic sequence set; each characteristic sequence in the characteristic sequence set is used for uniquely identifying a legal domain name; if the target similarity is larger than a first preset threshold, determining that the IDN to be detected is illegal; and the target similarity is the highest similarity in the determined similarities. The first characteristic sequence is used for uniquely identifying the IDN to be detected, each characteristic sequence in the characteristic sequence set is used for uniquely identifying the legal domain name, and further, the similarity between the first characteristic sequence and each characteristic sequence in the characteristic sequence set can reflect the similarity between the IDN to be detected and a plurality of legal domain names. Therefore, for an IDN to be detected, if the similarity with the highest value in the determined similarities exceeds a first preset threshold, the IDN to be detected can be determined to pretend to be a legal domain name. Finally, by adopting the technical means, illegal IDN can be detected.
Drawings
FIG. 1 is a first diagram illustrating a domain name category according to an embodiment of the present invention;
fig. 2 is a first schematic structural diagram of a domain name detection system according to an embodiment of the present invention;
fig. 3 is a first flowchart illustrating a domain name detection method according to an embodiment of the present invention;
fig. 4 is a schematic flow chart of a domain name detection method according to an embodiment of the present invention;
FIG. 5 is a first diagram of an image according to an embodiment of the present invention;
fig. 6 is a schematic flow chart of a domain name detection method according to an embodiment of the present invention;
fig. 7 is a schematic flow chart of a domain name detection method according to an embodiment of the present invention;
fig. 8 is a schematic flow chart of a domain name detection method according to an embodiment of the present invention;
fig. 9 is a first schematic diagram of a frequency coefficient matrix according to an embodiment of the present invention;
fig. 10 is a sixth schematic flowchart of a domain name detection method according to an embodiment of the present invention;
fig. 11 is a seventh flowchart illustrating a domain name detection method according to an embodiment of the present invention;
fig. 12 is a schematic flowchart of an eighth method for detecting a domain name according to an embodiment of the present invention;
FIG. 13 is a first diagram illustrating a feature sequence set according to an embodiment of the present invention;
Fig. 14 is a schematic flowchart of a domain name detection method according to an embodiment of the present invention;
fig. 15 is a schematic flowchart illustrating a domain name detection method according to an embodiment of the present invention;
fig. 16 is a first schematic structural diagram of a domain name detection apparatus according to an embodiment of the present invention;
fig. 17 is a schematic structural diagram of a domain name detection apparatus according to an embodiment of the present invention;
fig. 18 is a schematic structural diagram of a domain name detection apparatus according to an embodiment of the present invention;
fig. 19 is a fourth schematic structural diagram of a domain name detection apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
In the description of the present invention, "/" means "or" unless otherwise specified, for example, a/B may mean a or B. "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. Further, "at least one" means one or more, "a plurality" means two or more. The terms "first", "second", and the like do not necessarily limit the number and execution order, and the terms "first", "second", and the like do not necessarily limit the difference.
The inventive concept of the present invention is described below: at present, operation and maintenance personnel set a black list and a white list in a DNS server or a gateway; the blacklist stores the discovered illegal DN lists, and the white list stores the authenticated legal DN lists. After receiving a DN corresponding to an IDN, the DNS or the gateway may determine whether the IDN corresponding to the DN is legal or illegal according to the storage lists in the blacklist and the whitelist.
Based on the above techniques, the present invention has discovered that after a hacker performs a variety process on an existing illegal IDN, a new illegal IDN is generated. Accordingly, the new illegal IDN also corresponds to a new DN. After the DNS server or the gateway receives the new DN, the DNS server or the gateway cannot determine whether the new IDN is an illegal IDN because the DN corresponding to the new illegal IDN does not exist in the blacklist.
In view of the above technical problems, the present invention considers that when a hacker pretends an IDN, the key method is to make it difficult for the user to distinguish the difference between an illegal IDN and a pretended legal domain name. After receiving an IDN, the DNS server can determine whether the new IDN is an illegal DN if the new IDN and a legal domain name can be uniquely identified by a method and whether the new IDN pretends to be a legal domain name is determined according to the similarity between the results of the unique identification.
Based on the inventive concept, the embodiment of the invention provides a domain name detection method, which judges whether the IDN to be detected is an illegal IDN or not by calculating the similarity between the characteristic sequence of the IDN to be detected and the characteristic sequence of a legal domain name.
The domain name detection method provided by the embodiment of the invention is applied to a domain name detection system. Fig. 2 shows a schematic structural diagram of the domain name detection system. As shown in fig. 2, the domain name detection system 10 includes a domain name detection device 11 and a network device 12. The domain name detecting device 11 and the network device 12 may be connected in a wired manner or in a wireless manner, which is not limited in the embodiment of the present invention.
The domain name detection device 11 may be configured to perform validity detection on the obtained domain name, and may also be configured to send a DN, receive the DN, perform interconversion between an IDN and a DN (for example, as shown in fig. 1, interconversion between a domain name one and a domain name two), and resolve the DN to generate an IP address.
The network device 12 may be a DNS server, or may be a firewall or gateway device.
It should be noted that the domain name detecting device 11 and the network device 12 may be independent devices, or may be integrated into the same device, which is not specifically limited in the present invention.
When the domain name detection device 11 and the network device 12 are integrated in the same device, the communication mode between the domain name detection device 11 and the network device 12 is the communication between the internal modules of the device. In this case, the communication flow between the two is the same as the "communication flow between the domain name detection device 11 and the network device 12" in the case where they are independent of each other.
In the following embodiments provided by the present invention, the present invention is described by taking an example in which the domain name detection device 11 and the network device 12 are set independently of each other.
The following describes a domain name detection method provided in an embodiment of the present invention, with reference to the domain name detection system 10 shown in fig. 2.
As shown in fig. 3, the domain name detection method provided in this embodiment includes S201 to S205:
s201, the domain name detecting device 11 obtains the first feature sequence.
The first characteristic sequence is used for uniquely identifying the to-be-detected internationalized domain name IDN.
For example, the first feature sequence may be a character string having a preset length.
S202, the domain name detecting device 11 determines a similarity between the first feature sequence and each feature sequence in the stored feature sequence set.
Each characteristic sequence in the characteristic sequence set is used for uniquely identifying a legal domain name.
It should be noted that each feature sequence in the feature sequence set is generated according to a legal domain name.
S203, the domain name detection device 11 determines the target similarity from the determined similarities.
And the target similarity is the highest similarity in the determined similarities.
S204, the domain name detecting device 11 determines whether the target similarity is greater than a first preset threshold.
It should be noted that the first preset threshold may be set in the domain name detecting device 11 by the operation and maintenance staff.
S205, if the target similarity is greater than the first preset threshold, the domain name detecting device 11 determines that the IDN to be detected is illegal.
In one design, after determining that the IDN to be detected is illegal, the domain name detecting device 11 may update the DN corresponding to the IDN to be detected to a blacklist.
In this embodiment of the present invention, in order to obtain the first feature sequence, as shown in fig. 4 with reference to fig. 3, S201 in this embodiment of the present invention may specifically include S2011-S2013:
s2011, the domain name detecting device 11 obtains the IDN to be detected.
As a possible implementation manner, the domain name detecting device 11 may receive the IDN to be detected sent by the network device 12.
As another possible implementation, the domain name detection apparatus 11 may also convert the DN into the IDN to be detected after receiving the DN sent by the network device 12.
S2012, the domain name detecting device 11 loads the IDN to be detected to a preset image according to a preset rule to generate an image to be detected.
The preset rules specifically include one or more of the character size, font type, character interval, initial position and character distribution mode of the domain name to be detected displayed in the preset image.
As a possible implementation manner, in order to reduce the data processing amount of the subsequent image to be detected, the preset image may be a blank image.
It should be noted that the size of the preset image is larger than the size of the IDN to be detected after being displayed according to the preset rule. The generated image to be detected can be in a format type of jpeg or png.
Exemplarily, fig. 5 (a) shows an optional image to be detected, where the image to be detected includes the IDN to be detected set according to the preset rule. The preset image may specifically comprise 64 × 64 pixels; the font type of the IDN to be detected is an Arial font, the font size of the IDN to be detected is No. 10, and the character distribution mode of the IDN to be detected can be from left to right and from top to bottom.
S2013, the domain name detection device 11 identifies the image to be detected to determine the first feature sequence.
The first characteristic sequence is specifically used for uniquely identifying the content of the IDN to be detected displayed in the image to be detected.
In this embodiment of the present invention, when receiving the DN sent by the network device 12, the domain name detecting apparatus 11, in order to ensure reliability of subsequent domain name detection, as shown in fig. 6 in combination with fig. 4, specifically, S2011 in the embodiment of the present invention may include S20111-S20114:
s20111, the domain name detecting device 11 receives the DN sent by the network device 12.
S20112, the domain name detector 11 determines whether the DN transmitted from the network device 12 can be converted into the IDN.
As a possible implementation manner, the domain name detecting device 11 parses the DN sent by the network device 12 to determine whether the DN has an identifier of the IDN.
Wherein, the identification code of the IDN is automatically generated in the process of performing Punycode coding on the IDN before the network device 12 sends the DN to the domain name detection apparatus 11 (the Punycode coding is used for converting the IDN into the DN); the identification code of the IDN is used to identify the DN to which the IDN corresponds.
Illustratively, as shown in fig. 1, the identifier of IDN may specifically be xn in domain name one shown in fig. 1.
In one case, if the identification code of the IDN exists in the DN, the domain name detection means 11 determines that the DN sent by the network device 12 can be converted into the IDN, and performs S20113 described below.
In another case, if the DN does not have the id code of the IDN, the domain name detection device 11 resolves the DN to obtain the IP address.
It should be noted that, for the specific implementation of the domain name detecting device 11 analyzing the DN to obtain the IP address, reference may be made to the prior art, and details are not described herein.
S20113, if it is determined that the DN sent by the network device 12 can be converted into an IDN, the domain name detection apparatus 11 queries whether the black list and the white list include the DN sent by the network device 12.
It should be noted that the black list includes a list of illegal domain names (including DNs corresponding to illegal IDNs and illegal non-IDN domain names); the white list includes a list of legitimate domain names (including DNs corresponding to legitimate IDNs and legitimate domain names). The black list and the white list may be stored in the domain name detecting apparatus 11, or may be stored in a storage device capable of communicating with the domain name detecting apparatus 11.
S20114, if it is determined that the blacklist and the DN sent by the network device 12 are not included in the blacklist, the domain name detection device 11 performs Punycode decoding on the received DN to obtain the IDN to be detected.
Where Punycode decoding is used to convert the DN to the IDN.
It should be noted that, the implementation manner of performing Punycode decoding on the received DN by the domain name detection device 11 may specifically refer to the prior art, and details are not described here.
It can be understood that, in the foregoing S20111-S20114 provided by the embodiment of the present invention, when it is determined that a received DN can be converted into an IDN, the next detection action is performed, so that the validity of a subsequent domain name detection action is ensured; when the DN is determined not to exist in the blacklist and the white list, the DN is reflected to correspond to the IDN which is not discovered before, and the reliability of the subsequent domain name detection action is ensured.
In the embodiment of the present invention, in order to identify an image to be detected to determine the first feature sequence, as shown in fig. 7 in combination with fig. 4, S2013 in the embodiment of the present invention may specifically include S20131-S20132:
s20131, the domain name detection device 11 determines a low-frequency coefficient matrix of the image to be detected according to the image to be detected and a Discrete Cosine Transform (DCT) algorithm.
And each low-frequency coefficient in the low-frequency coefficient matrix is used for reflecting the profile and the gray distribution of the IDN to be detected in the image to be detected.
S20132, the domain name detecting device 11 encodes the low-frequency coefficient matrix of the image to be detected by using a preset encoding rule to generate a first feature sequence.
In one possible design, the first characteristic sequence provided by the embodiment of the present invention may be binary data.
The preset encoding rule may specifically include: if any low-frequency coefficient in a low-frequency coefficient matrix of the image to be detected is smaller than a second preset threshold value, determining a digital code corresponding to the any low-frequency coefficient in the first characteristic sequence as a first numerical value; if any low-frequency coefficient is smaller than a second preset threshold value, determining that a number corresponding to any low-frequency coefficient in the first characteristic sequence is a first numerical value; and the number corresponding to any low-frequency coefficient in the binary data corresponding to the IDN to be detected is a second numerical value.
It should be noted that the second preset threshold may be an average value of numerical values of low frequency coefficients in a low frequency coefficient matrix of the image to be detected, and the second preset threshold may also be set in the domain name detection device 11 by an operation and maintenance worker. The first value may be 1, and the first value may also be 0; when the first value is 1, the second value is 0; when the first value is 0, the first number is 1.
Illustratively, when the low-frequency coefficient matrix of the image to be detected includes 15 × 15 elements, the first feature sequence is binary data including 225 characters.
In the embodiment of the present invention, in order to determine the low-frequency coefficient matrix of the image to be detected, as shown in fig. 8 in combination with fig. 7, S20131 in the embodiment of the present invention may specifically include Sa-Sc:
sa, the domain name detection device 11 obtains the gray matrix of the image to be detected.
The gray matrix of the image to be detected is used for reflecting the texture characteristics of the image to be detected.
As a possible implementation manner, the domain name detection device 11 obtains a gray value of each pixel in the image to be detected, and determines a gray matrix of the image to be detected according to the gray value of each pixel.
It should be noted that, for a specific implementation manner of the steps in the embodiment of the present invention, reference may be made to the prior art, and details are not described here.
Illustratively, taking the preset image includes 64 × 64 pixels as an example, the grayscale matrix obtained in this step is a 64 × 64 matrix.
Sb, the domain name detection device 11 determines a frequency coefficient matrix of the image to be detected according to the gray matrix of the image to be detected and a DCT algorithm.
As a possible implementation manner, the domain name detection device 11 inputs the gray matrix of the image to be detected into the DCT algorithm to generate the frequency coefficient matrix of the image to be detected.
It should be noted that the frequency coefficient matrix of the image to be detected includes a low-frequency coefficient matrix and a high-frequency coefficient matrix of the image to be detected; and each element in the frequency coefficient matrix of the image to be detected is used for reflecting the intensity of change of each pixel gray level in the image to be detected.
Wherein, each element in the low-frequency coefficient matrix of the image to be detected is the low-frequency coefficient of the image to be detected; and each element in the high-frequency coefficient matrix of the image to be detected is the high-frequency coefficient of the image to be detected, and the high-frequency coefficient of the image to be detected is used for reflecting the detail information of the image to be detected.
It should be noted that, regarding the specific implementation method of the DCT algorithm in this step in the embodiment of the present invention, reference may be made to the prior art, and details are not described here.
And the Sc domain name detection device 11 determines a low-frequency coefficient matrix of the image to be detected from the frequency coefficient matrix of the image to be detected.
As a possible implementation mode, in the frequency coefficient matrix of the image to be detected, the alternating current coefficient is removed from the upper left corner area, and the low-frequency coefficient matrix of the image to be detected can be obtained.
The device comprises a frequency coefficient matrix, a first row element, a first column element, a second row element, a second column element and a third column element, wherein the upper left corner area of the frequency coefficient matrix of the image to be detected comprises a low-frequency coefficient matrix of the image to be detected; the lower right corner region of the frequency coefficient matrix comprises a high frequency coefficient matrix of the image to be detected.
It should be noted that the ac coefficient of the image to be detected is used to reflect the content displayed by the edge of the image to be detected.
For example, when the preset image includes 64 × 64 pixels, the frequency coefficient matrix of the image to be detected is shown in fig. 9, and the number of elements of the frequency coefficient matrix obtained by DCT-converting the preset image is also 64 × 64. Furthermore, the upper left corner region of the frequency coefficient matrix contains 16 × 16 elements, including the ac coefficient and the low frequency coefficient of the image to be detected. Wherein, alternating current coefficient is 31, and the element quantity in the low frequency coefficient matrix is 15 × 15.
In a possible design, in order to reduce data processing work on an image to be detected, as shown in fig. 10 in conjunction with fig. 8, the Sa according to the embodiment of the present invention may specifically include Sa1-Sa 2:
sa1, domain name detection apparatus 11 performs image binarization processing on the image to be detected, and generates an intermediate image.
As a possible implementation manner, the domain name detection apparatus 11 sets the gray scale values of all pixels in the image to be detected to be 0 or 255, and displays the image to be detected as a black-and-white result.
In the intermediate image, the gray scale value of the pixels included in the IDN to be detected is 255, and the gray scale value of the pixels in the background portion is 0.
Sa2, domain name detection apparatus 11 acquires a gradation matrix of the intermediate image.
Note that, a specific implementation of this step may refer to step Sa, which is not described herein again.
It can be understood that, in the embodiment of the present invention, by using the Sa1-Sa2, the gray value of each pixel in the background of the image to be detected is set to 0, and only the gray value of IDN to be detected is retained, that is, the ground color in the background of the image to be detected is removed, so that the calculation pressure of the domain name detection device is reduced in the subsequent data processing operation, and the hardware resources are saved.
In one design, when each of the first signature sequence and the signature sequence set is binary data, in order to determine whether the IDN to be detected is an illegal IDN, as shown in fig. 11 with reference to fig. 3, S202 in the embodiment of the present invention specifically includes S2021:
s2021, the domain name detecting device 11 calculates an edit distance between the first feature sequence and each feature sequence in the stored feature sequence set.
And each editing distance in the determined editing distance set is used for reflecting the similarity between each characteristic sequence in the characteristic sequence set and the first characteristic sequence. The smaller the edit distance of the two feature sequences, the higher the similarity of the two feature sequences.
In addition, the generation manner of each feature sequence in the feature sequence set is the same as the generation manner of the first feature sequence, and the number of the code of each feature sequence in the feature sequence set is the same as that of the first feature sequence.
It should be noted that, in this step provided in the embodiment of the present invention, reference may be made to the prior art for a specific method for calculating an edit distance between two binary data, and details are not described herein again.
In the case that the first feature sequence and each feature sequence in the feature sequence set are binary data, and the editing distance between the binary data reflects the similarity between the feature sequences, S203 in the embodiment of the present invention specifically includes S2031:
S2031, the domain name detection apparatus 11 determines the editing distance with the smallest numerical value among the calculated editing distances as the target editing distance.
In the case that the first feature sequence and each feature sequence in the feature sequence set are binary data, and the editing distance between the binary data reflects the similarity between the feature sequences, S204 in the embodiment of the present invention specifically includes S2041:
s2041, the domain name detecting device 11 determines whether the target edit distance is smaller than a third preset threshold.
It should be noted that the third preset threshold decreases with the increase of the first preset threshold, and the third preset threshold may also be set in the domain name detecting device 11 by the operation and maintenance staff.
In the case that the first feature sequence and each feature sequence in the feature sequence set are binary data, and the editing distance between the binary data reflects the similarity between the feature sequences, S205 provided in the embodiment of the present invention specifically includes S2051:
s2051, if the target edit distance is less than or equal to the third preset threshold, the domain name detection device 11 determines that the IDN to be detected is illegal.
In this embodiment of the present invention, in order to determine the feature sequence set, as shown in fig. 12 in combination with fig. 3, before S202, the domain name detection method provided in this embodiment of the present invention specifically includes S1-S3:
S1: the domain name detecting means 11 acquires a plurality of legitimate domain names.
For example, each legal domain name in the plurality of legal domain names may be as shown by domain name three in fig. 1.
In one design, the plurality of legitimate domain names in this step may also include legitimate IDNs. For example, as shown by the legal domain name with sequence number 3 in fig. 13.
It can be understood that each legal domain name in the multiple legal domain names provided in the embodiment of the present invention may specifically be a legal IDN, and the domain name detection device 11 can determine whether the IDN to be detected is disguised as another legal IDN.
S2, the domain name detecting device 11 generates a feature sequence of each legal domain name in the legal domain names according to the legal domain names, the preset rule, and the preset image.
The feature sequence of each legal domain name in the legal domain names is used for uniquely identifying one legal domain name in the legal domain names.
It should be noted that the feature sequence of each legal domain name in the multiple legal domain names is specifically used to uniquely identify the content displayed by one legal domain name in the multiple legal domain names in the preset image according to the preset rule.
It should be noted that, for a specific implementation manner of this step in the embodiment of the present invention, reference may be made to the above-mentioned S2011-S2013, which is not described herein again.
In one design, in order to ensure the accuracy of detecting a domain name by the domain name detecting device 11, as shown in fig. 5 (b), when the S2 provided in the embodiment of the present invention generates a feature sequence of each legal domain name in a plurality of legal domain names, the adopted preset rule, preset image, DCT algorithm, and preset encoding rule are the same as the preset rule, preset image, DCT algorithm, and preset encoding rule in S2011-S2013.
S3, the domain name detecting device 11 stores the feature sequences of the legal domain names in the legal domain names to generate a feature sequence set.
It is understood that the above-mentioned S1-S3 provided by the embodiment of the present invention can generate a feature sequence set, and provide a data basis for determining the similarity in S202. Meanwhile, as the generation method and conditions of each feature sequence in the feature sequence set are the same as those of the first feature sequence, the accuracy of detecting the domain name by the domain name detection device 11 can be ensured.
Illustratively, the set of signature sequences may be as shown in FIG. 13; the feature sequence set comprises a plurality of legal domain names and feature sequences of the legal domain names in the legal domain names.
In one design, considering that the feature sequences contained in the feature sequence set are in a mass level, and when a hacker pretends a domain name, in order to make it difficult for a user to distinguish an illegal domain name, the character string length of the illegal domain name is set to be the same as that of a legal domain name. Therefore, in order to reduce the working pressure of the domain name detection device 11 in determining the similarity and save the calculation time, as shown in fig. 13, the feature sequence set of the embodiment of the present invention further includes the string length of each legal domain name in the plurality of legal domain names. Referring to fig. 3, as shown in fig. 14, S202 in the embodiment of the present invention further includes S2022-S2024:
S2022, the domain name detection device 11 obtains the length of the character string of the IDN to be detected.
S2023, the domain name detection device 11 determines a first target feature sequence in the feature sequence set according to the length of the character string of the IDN to be detected.
The character string length of the legal domain name corresponding to the first target characteristic sequence is the same as that of the IDN to be detected.
S2024, the domain name detecting device 11 calculates a similarity between the first feature sequence and the determined first target feature sequence.
It should be noted that, the similarity between the feature sequences calculated in this step may specifically refer to the above S2021, which is not described herein again.
It can be understood that, by using the above S2022-S2024 in the embodiment of the present invention, the legal domain name with the same length as the IDN character string to be detected can be screened from the massive feature sequences according to the first target feature sequence determined from the feature sequence set, so that the calculation pressure of the domain name detection device 11 can be reduced, and the calculation time can be saved.
In another design, considering that the feature sequences included in the feature sequence set are in a mass level, and when a hacker performs domain name disguising, in order to make it difficult for a user to distinguish an illegal domain name, a language family of the illegal domain name is set to a language family with a high degree of similarity to characters included in a legal language family. Therefore, in order to reduce the working pressure of the domain name detection device 11 when determining the similarity and save the calculation time, as shown in fig. 13, the feature sequence set of the embodiment of the present invention further includes the language family identifier of each legal domain name in the plurality of legal domain names; with reference to fig. 12, as shown in fig. 15, the S202 in the embodiment of the present invention specifically includes S2025-S2028:
S2025, the domain name detection device 11 obtains the language family identifier of the IDN to be detected.
S2026, the domain name detection device 11 determines the target language family identifier according to the language family identifier of the IDN to be detected.
Wherein, the similarity between the character corresponding to the target language family and the character included in the language family of the IDN to be detected is larger than a fourth preset threshold.
As a possible implementation manner, the domain name detecting device 11 may query the target language family identifier from a language family corresponding list stored in advance; the language family correspondence list includes similarities between language families.
It should be noted that the fourth preset threshold may be specifically set in the domain name detection device 11 by the operation and maintenance staff.
S2027, the domain name detecting device 11 determines a second target feature sequence in the feature sequence set according to the target language family identifier.
And each second target characteristic sequence in the second target characteristic sequence set corresponds to the target language family identifier.
S2028, the domain name detecting device 11 calculates a similarity between the first feature sequence and the determined second target feature sequence.
It should be noted that, the similarity between the feature sequences calculated in this step may specifically refer to the above S2021, which is not described herein again.
It can be understood that, by using the above-mentioned S2025-S2028 in the embodiment of the present invention, the legal domain name which can be disguised by the IDN to be detected can be screened from the massive feature sequences according to the second target feature sequence determined from the feature sequence set, so that the calculation pressure of the domain name detection device 11 can be reduced, and the calculation time can be saved.
In another design, in order to reduce the working pressure of the domain name detection device 11 when determining the similarity and save the calculation time, in S202 provided in the embodiment of the present invention, a third target feature sequence in the feature sequence set may be further determined according to the determined first target feature sequence and the determined second target feature sequence; calculating the similarity between the first characteristic sequence and the determined third target characteristic sequence; the third target feature sequence is an intersection of the determined first target feature sequence and the determined second target feature.
The domain name detection method provided by the embodiment of the invention is applied to detecting illegal domain names. The method judges whether the IDN to be detected pretends to be a legal domain name or not by calculating the similarity between the first characteristic sequence of the IDN to be detected and the characteristic sequence of the legal domain name. Because the first characteristic sequence is used for uniquely identifying the IDN to be detected, the characteristic sequences in the characteristic sequence set are used for uniquely identifying the legal domain name, and the similarity between the first characteristic sequence and each characteristic sequence in the characteristic sequence set can reflect the similarity between the IDN to be detected and a plurality of legal domain names. Furthermore, if the highest similarity among the determined similarities exceeds a first preset threshold, it can be determined that the IDN to be detected pretends to be a legal domain name. Finally, by adopting the technical means, the illegal IDN can be accurately detected.
The above description mainly introduces the solutions provided by the embodiments of the present invention from the perspective of methods. In order to implement the above functions, it includes a hardware structure and/or a software module for performing each function. Those of skill in the art will readily appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiment of the present invention, the domain name detection apparatus 11 may be divided into functional modules according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. Optionally, the division of the modules in the embodiment of the present invention is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
Fig. 16 is a schematic structural diagram of a domain name detection apparatus according to an embodiment of the present invention. As shown in fig. 16, the domain name detection device 11 is configured to detect the validity of the obtained IDN, for example, to execute the domain name detection method shown in fig. 3. The domain name detection apparatus 11 includes an acquisition unit 111, a determination unit 112, and a judgment unit 113.
An obtaining unit 111 is configured to obtain a first feature sequence. The first characteristic sequence is used for uniquely identifying the international domain name IDN to be detected. For example, in conjunction with fig. 3, the obtaining unit 111 may be configured to perform S201.
A determining unit 112, configured to determine, according to the first feature sequence acquired by the acquiring unit 111, a similarity between the first feature sequence and each feature sequence in the stored feature sequence set; each characteristic sequence in the characteristic sequence set is used for uniquely identifying a legal domain name. For example, in conjunction with fig. 3, the determination unit 112 may be configured to perform S202.
The determining unit 112 is further configured to determine a target similarity from the determined similarities; and the target similarity is the highest similarity in the determined similarities. For example, in conjunction with fig. 3, the determination unit 112 may be configured to perform S203.
A judging unit 113, configured to judge whether the target similarity determined by the determining unit 112 is greater than a first preset threshold. For example, in conjunction with fig. 3, the determination unit 113 may be configured to execute S204.
The determining unit 112 is further configured to determine that the IDN to be detected is illegal if the target similarity is greater than a first preset threshold. For example, in conjunction with fig. 3, the determination unit 112 may be configured to perform S205.
Optionally, as shown in fig. 17, the obtaining unit 111 provided in the embodiment of the present invention specifically includes an obtaining subunit 1111, a generating subunit 1112, and a determining subunit 1113.
An obtaining subunit 1111 configured to obtain the IDN to be detected. For example, in connection with fig. 4, the fetch subunit 1111 may be configured to perform S2011.
A generating sub-unit 1112, configured to load the IDN to be detected acquired by the acquiring sub-unit 1111 into a preset image according to a preset rule, so as to generate an image to be detected. For example, in conjunction with fig. 4, the generation subunit 1112 may be configured to perform S2012.
The determining subunit 1113 is configured to identify the to-be-detected image generated by the generating subunit 1112 to determine the first feature sequence. For example, in conjunction with fig. 4, the determination subunit 1113 may be configured to perform S2013.
Optionally, as shown in fig. 17, the determining subunit 1113 provided in the embodiment of the present invention is specifically configured to determine a low-frequency coefficient matrix of an image to be detected according to the image to be detected and a discrete cosine transform DCT algorithm; and each low-frequency coefficient in the low-frequency coefficient matrix of the image to be detected is used for reflecting the profile and the gray distribution of the IDN to be detected in the image to be detected. For example, in conjunction with fig. 7, the determining subunit 1113 may be configured to perform S20131.
The determining subunit 1113 is further configured to encode the low-frequency coefficient matrix of the image to be detected by using a preset encoding rule to generate a first feature sequence. For example, in conjunction with fig. 7, the determining subunit 1113 may be configured to perform S20132.
Optionally, as shown in fig. 17, the domain name detection apparatus 11 provided in the embodiment of the present invention further includes a generation unit 114.
The obtaining unit 111 is further configured to obtain a plurality of legal domain names. For example, in conjunction with fig. 12, the obtaining unit 111 may be configured to execute S1.
A generating unit 114, configured to generate a feature sequence of each legal domain name in the multiple legal domain names according to the multiple legal domain names, the preset rule, and the preset image acquired by the acquiring unit 111; the feature sequence of each legal domain name in the legal domain names is used for uniquely identifying one legal domain name in the legal domain names. For example, in conjunction with fig. 12, the generating unit 114 may be configured to perform S2.
The generating unit 114 is further configured to, after generating the feature sequence of each legal domain name in the multiple legal domain names, store the feature sequences of each legal domain name in the multiple legal domain names to generate a feature sequence set. For example, in conjunction with fig. 12, the generating unit 114 may be configured to perform S3.
Optionally, the feature sequence set provided in the embodiment of the present invention further includes a string length of each legal domain name in the multiple legal domain names; as shown in fig. 17, the determining unit 112 of the embodiment of the present invention is further configured to obtain the length of the string of IDNs to be detected. For example, in connection with fig. 14, the determining unit 112 may be configured to execute S2022.
The determining unit 112 is further configured to determine, after obtaining the length of the character string of the IDN to be detected, a first target feature sequence of the feature sequence set according to the length of the character string of the IDN to be detected; the character string length of the legal domain name corresponding to the first target characteristic sequence is the same as that of the IDN to be detected. For example, in conjunction with fig. 14, the determination unit 112 may be configured to perform S2023.
The determining unit 112 is further specifically configured to calculate a similarity between the first feature sequence and the determined first target feature sequence. For example, in conjunction with fig. 14, the determination unit 112 may be configured to perform S2024.
Optionally, the feature sequence set provided in the embodiment of the present invention further includes a language family identifier of each legal domain name in the multiple legal domain names. As shown in FIG. 17, the determining unit 112 of the embodiment of the present invention is further used for obtaining the language family identifier of the IDN to be detected. For example, in conjunction with fig. 15, the determination unit 112 may be configured to perform S2025.
A determining unit 112, specifically, further configured to determine a target language family identifier according to the language family identifier of the IDN to be detected after acquiring the language family identifier of the IDN to be detected; wherein, the similarity between the character corresponding to the target language family mark and the character included in the language family of the IDN to be detected is larger than a fourth preset threshold. For example, in conjunction with fig. 15, the determination unit 112 may be configured to perform S2026.
The determining unit 112 is further configured to determine, after determining the target language family identifier, a second target feature sequence in the feature sequence set according to the target language family identifier; and each second target characteristic sequence in the second target characteristic sequence set corresponds to the target language family identifier. For example, in conjunction with fig. 15, the determination unit 112 may be configured to perform S2027.
The determining unit 112 is specifically further configured to calculate similarity between each second target feature sequence in the second target feature sequence set and the first feature sequence. For example, in conjunction with fig. 15, the determination unit 112 may be configured to perform S2028.
Optionally, as shown in fig. 17, the obtaining unit 111 provided in the embodiment of the present invention is specifically further configured to receive a DN sent by the network device 12. For example, in conjunction with fig. 6, the obtaining unit 111 may be configured to perform S20111.
The obtaining unit 111 is further configured to determine whether the DN sent by the network device 12 can be converted into the IDN. For example, in conjunction with fig. 6, the obtaining unit 111 may be configured to perform S20112.
The obtaining unit 111 is further configured to query whether the DN sent by the network device 12 is included in the blacklist and the whitelist if it is determined that the DN sent by the network device 12 can be converted into the IDN. For example, in conjunction with fig. 6, the obtaining unit 111 may be configured to perform S20113.
The obtaining unit 111 specifically performs Punycode decoding on the received DN if it is determined that the blacklist and the DN sent by the network device 12 are not included in the blacklist, so as to obtain the IDN to be detected. For example, in conjunction with fig. 6, the obtaining unit 111 may be configured to perform S20114.
Optionally, as shown in fig. 17, the obtaining unit 111 provided in the embodiment of the present invention is further specifically configured to determine a low-frequency coefficient matrix of the image to be detected according to the image to be detected and a discrete cosine transform DCT algorithm after obtaining the image to be detected. For example, in conjunction with fig. 7, the obtaining unit 111 may be configured to perform S20131.
The obtaining unit 111 is further configured to, after determining the low-frequency coefficient matrix of the image to be detected, encode the low-frequency coefficient matrix of the image to be detected by using a preset encoding rule to generate a first feature sequence. For example, in conjunction with fig. 7, the obtaining unit 111 may be configured to perform S20132.
Optionally, as shown in fig. 17, an obtaining unit 111 is provided in the embodiment of the present invention, and is specifically configured to obtain a gray matrix of an image to be detected. For example, in conjunction with fig. 8, the acquisition unit 111 may be used to perform Sa.
The obtaining unit 111 is further configured to, after obtaining the gray matrix of the image to be detected, determine the frequency coefficient matrix of the image to be detected according to the gray matrix of the image to be detected and a DCT algorithm. For example, in connection with fig. 8, the acquisition unit 111 may be used to perform Sb.
The obtaining unit 111 is further configured to determine a low-frequency coefficient matrix of the image to be detected from the frequency coefficient matrix of the image to be detected after determining the frequency coefficient matrix of the image to be detected. For example, in connection with fig. 8, the acquisition unit 111 may be configured to perform Sc.
Optionally, as shown in fig. 17, an obtaining unit 111 is provided in the embodiment of the present invention, and is specifically further configured to perform image binarization processing on an image to be detected to generate an intermediate image. For example, in conjunction with fig. 10, the acquisition unit 111 may be configured to execute Sa 1.
The obtaining unit 111 is further configured to obtain a gray matrix of the intermediate image. For example, in conjunction with fig. 10, the acquisition unit 111 may be configured to execute Sa 2.
Optionally, as shown in fig. 16, the determining unit 112 provided in the embodiment of the present invention is further specifically configured to calculate an edit distance between the first feature sequence and each feature sequence in the stored feature sequence set after the first feature sequence of the IDN to be detected is acquired. For example, in conjunction with fig. 11, the acquisition unit 111 may be configured to perform S2021.
The determining unit 112 is further specifically configured to determine, from the calculated editing distances, an editing distance with a smallest numerical value as the target editing distance. For example, in conjunction with fig. 11, the determination unit 112 may be configured to execute S2031.
The determining unit 113 is specifically configured to determine whether the target editing distance is smaller than a third preset threshold. For example, in conjunction with fig. 11, the determination unit 113 may be configured to execute S2041.
The determining unit 112 is further configured to determine that the IDN to be detected is illegal if the target editing distance is less than or equal to a third preset threshold. For example, in conjunction with fig. 11, the determination unit 112 may be configured to perform S2051.
In the case of implementing the functions of the integrated module in the form of hardware, the embodiment of the present invention provides another possible structural schematic diagram of the domain name detecting apparatus in the above embodiment. As shown in fig. 18, a domain name detection apparatus 30 for detecting an illegal IDN, for example, for performing the domain name detection method shown in fig. 3. The domain name detection device 30 includes a processor 301, a memory 302, a communication interface 303, and a bus 304. The processor 301, the memory 302, and the communication interface 303 may be connected by a bus 304.
The processor 301 is a control center of the communication apparatus, and may be a single processor or a collective term for a plurality of processing elements. For example, the processor 301 may be a general-purpose Central Processing Unit (CPU), or may be another general-purpose processor. Wherein the general purpose processor may be a microprocessor or any conventional processor or the like.
For one embodiment, processor 301 may include one or more CPUs, such as CPU 0 and CPU 1 shown in FIG. 18.
The memory 302 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
As a possible implementation, the memory 302 may exist separately from the processor 301, and the memory 302 may be connected to the processor 301 through the bus 304 for storing instructions or program code. The processor 301, when calling and executing the instructions or program codes stored in the memory 302, can implement the domain name detection method provided by the embodiment of the present invention.
In another possible implementation, the memory 302 may also be integrated with the processor 301.
A communication interface 303 for connecting with other devices through a communication network. The communication network may be an ethernet network, a radio access network, a Wireless Local Area Network (WLAN), etc. The communication interface 303 may include a receiving unit for receiving data, and a transmitting unit for transmitting data.
The bus 304 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 18, but this does not mean only one bus or one type of bus.
It should be noted that the configuration shown in fig. 18 does not constitute a limitation on the domain name detecting device 30. The domain name detection device 30 may include more or fewer components than those shown in fig. 18, or some components may be combined, or a different arrangement of components than those shown.
As an example, in connection with fig. 16, the functions implemented by the acquisition unit 111, the determination unit 112, and the judgment unit 113 in the domain name detection apparatus are the same as those of the processor 301 in fig. 18.
Fig. 19 shows another hardware configuration of the domain name detection apparatus in the embodiment of the present invention. As shown in fig. 19, the domain name detecting device 40 may include a processor 401 and a communication interface 402. The processor 401 is coupled to a communication interface 402.
The functions of the processor 401 may refer to the description of the processor 301 above. The processor 401 also has a memory function, and the function of the memory 302 can be referred to above.
The communication interface 402 is used to provide data to the processor 401. The communication interface 402 may be an internal interface of the communication device, or may be an external interface (corresponding to the communication interface 303) of the communication device.
It is to be noted that the configuration shown in fig. 18 (or fig. 19) does not constitute a limitation of the communication apparatus, and the domain name detection apparatus 11 may include more or less components than those shown in fig. 18 (or fig. 19), or combine some components, or a different arrangement of components, in addition to the components shown in fig. 18 (or fig. 19).
Through the above description of the embodiments, it is clear for a person skilled in the art that, for convenience and simplicity of description, only the division of the above functional units is illustrated. In practical applications, the above function allocation can be performed by different functional units according to needs, that is, the internal structure of the device is divided into different functional units to perform all or part of the above described functions. For the specific working processes of the system, the apparatus and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.
The embodiment of the present invention further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer executes the instructions, the computer executes each step in the method flow shown in the foregoing method embodiment.
Embodiments of the present invention provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the domain name detection method of the above method embodiments.
The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, and a hard disk. Random Access Memory (RAM), Read-Only Memory (ROM), Erasable Programmable Read-Only Memory (EPROM), registers, a hard disk, an optical fiber, a portable Compact disk Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any other form of computer-readable storage medium, in any suitable combination, or as appropriate in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). In embodiments of the invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Since the domain name detection apparatus, the computer-readable storage medium, and the computer program product in the embodiments of the present invention may be applied to the method described above, for technical effects that can be obtained with reference to the embodiments of the method described above, details of the embodiments of the present invention are not repeated herein.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions within the technical scope of the present invention are intended to be covered by the scope of the present invention.

Claims (10)

1. A method for detecting a domain name, the method comprising:
acquiring a first characteristic sequence; the first characteristic sequence is used for uniquely identifying the international domain name IDN to be detected;
determining a similarity between the first feature sequence and each feature sequence in a stored feature sequence set; each characteristic sequence in the characteristic sequence set is used for uniquely identifying a legal domain name;
if the target similarity is larger than a first preset threshold, determining that the IDN to be detected is illegal; wherein the target similarity is the similarity with the highest value in the determined similarities;
the acquiring of the first feature sequence specifically includes:
Acquiring the IDN to be detected;
loading the IDN to be detected into a preset image according to a preset rule to generate an image to be detected;
identifying the image to be detected to determine the first characteristic sequence;
the acquiring the IDN to be detected specifically includes:
receiving DN sent by network equipment;
judging whether DN sent by the network equipment can be converted into IDN;
if the DN sent by the network equipment can be converted into the IDN, inquiring whether the blacklist and the white list comprise the DN sent by the network equipment;
if the blacklist and the white list do not include the DN sent by the network equipment, performing Punvcode decoding on the received DN to obtain the IDN to be detected;
the identifying the image to be detected to determine the first characteristic sequence specifically includes:
determining a low-frequency coefficient matrix of the image to be detected according to the image to be detected and a Discrete Cosine Transform (DCT) algorithm; each low-frequency coefficient in the low-frequency coefficient matrix is used for reflecting the outline and the gray distribution of the IDN to be detected in the image to be detected;
and encoding the low-frequency coefficient matrix of the image to be detected by utilizing a preset encoding rule so as to generate the first characteristic sequence, wherein the preset encoding comprises the following steps: and if any low-frequency coefficient in the low-frequency coefficient matrix of the image to be detected is smaller than a second preset threshold value, determining that a digital code corresponding to any low-frequency coefficient in the first characteristic sequence is a first numerical value.
2. The domain name detection method according to claim 1, characterized in that the method further comprises:
acquiring a plurality of legal domain names;
respectively generating a feature sequence of each legal domain name in the plurality of legal domain names according to the plurality of legal domain names, the preset rule and the preset image; the feature sequence of each legal domain name in the plurality of legal domain names is used for uniquely identifying one legal domain name in the plurality of legal domain names;
and storing the plurality of legal domain names and the characteristic sequences of the legal domain names in the plurality of legal domain names to generate the characteristic sequence set.
3. The domain name detection method according to claim 2, wherein the feature sequence set further includes a string length of each legal domain name; the determining the similarity between the first feature sequence and each feature sequence in the stored feature sequence set specifically includes:
acquiring the length of the character string of the IDN to be detected;
determining a first target characteristic sequence in the characteristic sequence set according to the length of the character string of the IDN to be detected; the character string length of the legal domain name corresponding to the first target characteristic sequence is the same as the character string length of the IDN to be detected;
And calculating the similarity between the first characteristic sequence and the determined first target characteristic sequence.
4. The domain name detection method according to claim 2, characterized in that the feature sequence set further comprises language family identifiers of the legal domain names; the determining the similarity between the first feature sequence and each feature sequence in the stored feature sequence set specifically includes:
acquiring a language family identifier of the IDN to be detected;
determining a target language family identifier according to the language family identifier of the IDN to be detected; wherein, the similarity between the character corresponding to the target language family identifier and the character included in the language family of the IDN to be detected is greater than a fourth preset threshold;
determining a second target characteristic sequence set in the characteristic sequence set according to the target language family identifier; each second target feature sequence in the second target feature sequence set corresponds to the target language family identifier;
and calculating the similarity between the first characteristic sequence and the determined second target characteristic sequence.
5. A domain name detection device is characterized by comprising an acquisition unit and a determination unit;
The acquiring unit is used for acquiring a first characteristic sequence; the first characteristic sequence is used for uniquely identifying the to-be-detected internationalized domain name IDN;
the determining unit is configured to determine, according to the first feature sequence obtained by the obtaining unit, a similarity between the first feature sequence and each feature sequence in a stored feature sequence set; each characteristic sequence in the characteristic sequence set is used for uniquely identifying a legal domain name;
the determining unit is further configured to determine that the IDN to be detected is illegal if the target similarity is greater than a first preset threshold; the target similarity is the similarity with the highest value in the determined similarities;
the acquisition unit specifically comprises an acquisition subunit, a generation subunit and a determination subunit;
the obtaining subunit is configured to obtain the IDN to be detected;
the generation subunit is configured to load the IDN to be detected acquired by the acquisition subunit into a preset image according to a preset rule, so as to generate an image to be detected;
the determining subunit is configured to identify the image to be detected generated by the generating subunit to determine the first feature sequence;
The acquisition unit is also used for receiving DN sent by the network equipment;
the determining unit is further configured to determine whether a DN sent by the network device can be converted into an IDN;
the determining unit is further configured to query whether a black list and a white list include a DN sent by the network device if the DN sent by the network device can be converted into an IDN;
the determining unit is further configured to perform punvocode decoding on the received DN if the black list and the white list do not include the DN sent by the network device, so as to obtain the IDN to be detected;
the determining subunit is specifically configured to determine a low-frequency coefficient matrix of the image to be detected according to the image to be detected and a Discrete Cosine Transform (DCT) algorithm; each low-frequency coefficient in the low-frequency coefficient matrix is used for reflecting the outline and the gray distribution of the IDN to be detected in the image to be detected;
the determining subunit is further configured to encode, by using a preset encoding rule, the low-frequency coefficient matrix of the image to be detected to generate the first feature sequence, where the preset encoding includes: and if any low-frequency coefficient in the low-frequency coefficient matrix of the image to be detected is smaller than a second preset threshold value, determining that a digital code corresponding to any low-frequency coefficient in the first characteristic sequence is a first numerical value.
6. The domain name detection apparatus according to claim 5, characterized in that the domain name detection apparatus further comprises a generation unit;
the acquiring unit is further configured to acquire a plurality of legal domain names;
the generating unit is configured to generate a feature sequence of each legal domain name in the multiple legal domain names respectively according to the multiple legal domain names, the preset rule, and the preset image acquired by the acquiring unit; the feature sequence of each legal domain name in the plurality of legal domain names is used for uniquely identifying one legal domain name in the plurality of legal domain names;
the generating unit is further configured to store the plurality of legal domain names and the feature sequences of the legal domain names in the plurality of legal domain names after generating the feature sequences of the legal domain names in the plurality of legal domain names, so as to generate the feature sequence set.
7. The domain name detection device according to claim 6, wherein the feature sequence set further includes a string length of each legal domain name in the plurality of legal domain names; the determining unit is specifically configured to obtain a length of the string of the IDN to be detected;
the determining unit is specifically configured to determine, after the length of the character string of the IDN to be detected is obtained, a first target feature sequence in the feature sequence set according to the length of the character string of the IDN to be detected; the length of the character string of the legal domain name corresponding to each first target characteristic sequence in the first target characteristic sequence set is the same as that of the character string of the IDN to be detected;
The determining unit is specifically further configured to calculate a similarity between the first feature sequence and the determined first target feature sequence.
8. The domain name detection device according to claim 6, wherein the feature sequence set further includes a language family identifier of each legal domain name in the plurality of legal domain names;
the determining unit is specifically further configured to obtain a language family identifier of the IDN to be detected;
the determining unit is specifically configured to determine a target language family identifier according to the language family identifier of the IDN to be detected after acquiring the language family identifier of the IDN to be detected; wherein, the similarity between the character corresponding to the target language family identifier and the character included in the language family of the IDN to be detected is greater than a fourth preset threshold;
the determining unit is specifically configured to determine, after determining the target language family identifier, a second target feature sequence in the feature sequence set according to the target language family identifier; wherein the second target feature sequence corresponds to the target language family identifier;
the determining unit is specifically further configured to calculate a similarity between the first feature sequence and the determined second target feature sequence.
9. A computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computer, cause the computer to perform the domain name detection method of any of claims 1-4.
10. A domain name detecting apparatus, comprising: a processor, a memory, and a communication interface; the communication interface is used for the domain name detection device to communicate with other equipment or a network; the memory is used for storing one or more programs, the one or more programs include computer-executable instructions, and when the domain name detection device runs, the processor executes the computer-executable instructions stored in the memory to enable the domain name detection device to execute the domain name detection method according to any one of claims 1 to 4.
CN202010408567.8A 2020-05-14 2020-05-14 Domain name detection method and device Active CN111654472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010408567.8A CN111654472B (en) 2020-05-14 2020-05-14 Domain name detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010408567.8A CN111654472B (en) 2020-05-14 2020-05-14 Domain name detection method and device

Publications (2)

Publication Number Publication Date
CN111654472A CN111654472A (en) 2020-09-11
CN111654472B true CN111654472B (en) 2022-05-24

Family

ID=72348186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010408567.8A Active CN111654472B (en) 2020-05-14 2020-05-14 Domain name detection method and device

Country Status (1)

Country Link
CN (1) CN111654472B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103428307A (en) * 2013-08-09 2013-12-04 中国科学院计算机网络信息中心 Method and equipment for detecting counterfeit domain names
CN106170002A (en) * 2016-09-08 2016-11-30 中国科学院信息工程研究所 A kind of Chinese counterfeit domain name detection method and system
CN107370718A (en) * 2016-05-12 2017-11-21 深圳市深信服电子科技有限公司 The detection method and device of black chain in webpage

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9218335B2 (en) * 2012-10-10 2015-12-22 Verisign, Inc. Automated language detection for domain names
CN103957191A (en) * 2014-04-03 2014-07-30 中国科学院计算机网络信息中心 Detection method for Chinese domain name spoof attack

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103428307A (en) * 2013-08-09 2013-12-04 中国科学院计算机网络信息中心 Method and equipment for detecting counterfeit domain names
CN107370718A (en) * 2016-05-12 2017-11-21 深圳市深信服电子科技有限公司 The detection method and device of black chain in webpage
CN106170002A (en) * 2016-09-08 2016-11-30 中国科学院信息工程研究所 A kind of Chinese counterfeit domain name detection method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
同形异义的国际化域名检测与测量;刘莹等;《东南大学学报(自然科学版)》;20171120;第47卷;第1.2节第1段至第5节最后一段 *

Also Published As

Publication number Publication date
CN111654472A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
CN108804925B (en) Method and system for detecting malicious code
CN108650260B (en) Malicious website identification method and device
CN108154031B (en) Method, device, storage medium and electronic device for identifying disguised application
CN111460446A (en) Malicious file detection method and device based on model
CN111159413A (en) Log clustering method, device, equipment and storage medium
CN111368289A (en) Malicious software detection method and device
CN112016078A (en) Method, device, server and storage medium for detecting forbidding of login equipment
CN113542442B (en) Malicious domain name detection method, device, equipment and storage medium
CN112751804B (en) Method, device and equipment for identifying counterfeit domain name
CN111654472B (en) Domain name detection method and device
CN110958244A (en) Method and device for detecting counterfeit domain name based on deep learning
CN111339531A (en) Malicious code detection method and device, storage medium and electronic equipment
CN111159115A (en) Similar file detection method, device, equipment and storage medium
CN109992960B (en) Counterfeit parameter detection method and device, electronic equipment and storage medium
CN112532645A (en) Internet of things equipment operation data monitoring method and system and electronic equipment
JP6954466B2 (en) Generation method, generation device and generation program
CN111741009A (en) Business data management method, system, server and storage medium
CN110705603A (en) Method and system for dynamically judging similarity of user request data
CN115955457A (en) Malicious domain name detection method and device and electronic equipment
CN113220949B (en) Construction method and device of private data identification system
CN114448664A (en) Phishing webpage identification method and device, computer equipment and storage medium
JP6559313B1 (en) Affiliation analysis data construction system, affiliation analysis data construction program, and affiliation analysis system
CN114629707A (en) Method and device for detecting messy codes, electronic equipment and storage medium
EP3940626A1 (en) Information processing method and information processing system
CN112910832B (en) International domain name spoofing attack recognition and analysis method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230619

Address after: 12 / F, Dongfang hope scientific research building, No.3, Gaopeng Avenue, high tech Zone, Chengdu, Sichuan 610041

Patentee after: ASIAINFO TECHNOLOGIES (CHENGDU), Inc.

Patentee after: AsiaInfo Security Technology Co.,Ltd.

Address before: 12 / F, Dongfang hope scientific research building, No.3, Gaopeng Avenue, high tech Zone, Chengdu, Sichuan 610041

Patentee before: ASIAINFO TECHNOLOGIES (CHENGDU), Inc.

TR01 Transfer of patent right