CN114363290B - Domain name identification method, device, equipment and storage medium - Google Patents

Domain name identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN114363290B
CN114363290B CN202111672272.2A CN202111672272A CN114363290B CN 114363290 B CN114363290 B CN 114363290B CN 202111672272 A CN202111672272 A CN 202111672272A CN 114363290 B CN114363290 B CN 114363290B
Authority
CN
China
Prior art keywords
domain name
data
sub
feature
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111672272.2A
Other languages
Chinese (zh)
Other versions
CN114363290A (en
Inventor
赖秋楠
梁彧
傅强
蔡琳
杨满智
田野
王杰
阿曼太
金红
陈晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eversec Beijing Technology Co Ltd
Original Assignee
Eversec Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eversec Beijing Technology Co Ltd filed Critical Eversec Beijing Technology Co Ltd
Priority to CN202111672272.2A priority Critical patent/CN114363290B/en
Publication of CN114363290A publication Critical patent/CN114363290A/en
Application granted granted Critical
Publication of CN114363290B publication Critical patent/CN114363290B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the invention discloses a domain name identification method, a device, equipment and a storage medium. The method comprises the following steps: determining sub-domain name data in domain name data to be identified; acquiring at least one sub domain name feature of sub domain name data, and matching each sub domain name feature with a target feature condition to obtain a sub domain name feature matching result; the target characteristic conditions are determined according to the subdomain name level threshold value and the characteristic structure data; and under the condition that the characteristic matching result of each subdomain name is determined to be successful in matching, determining the domain name data to be identified as target domain name data. The embodiment of the invention can realize rapid and batch domain name identification without occupying network communication bandwidth, saves network resources and reduces domain name identification cost.

Description

Domain name identification method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a domain name identification method, a device, equipment and a storage medium.
Background
The identification of the domain name type has important significance for technologies such as network operation and maintenance, network security detection and the like. For example, CDN (Content Delivery Network ) technology can return multiple IP addresses (Internet Protocol Address, internet protocol addresses) for providing services in DNS (Domain Name System ) records, and even if a single server fails, other servers can still provide services, greatly improving service availability. In addition, the purpose of load balancing can be achieved through rotation of a plurality of service IP addresses. When CDN technology application exists in the network, CDN domain names can be identified and access speeds thereof can be tested; or when network security detection is carried out, the CND domain name has a similar structure with an illegal domain name generated by the Fast-flux technology, so that the CND domain name is easy to be misjudged as an abnormal domain name, and the CDN domain name can be further identified in the abnormal domain name so as to accurately detect the illegal domain name.
In the prior art, the domain name type is generally required to be identified by accessing the domain name and the IP address corresponding to the domain name. However, such a method cannot quickly identify domain name types in batches, and needs to occupy network communication bandwidth, consume network resources, and have extremely high domain name identification cost.
Disclosure of Invention
The embodiment of the invention provides a domain name identification method, device, equipment and storage medium, which are used for realizing rapid and batch identification of domain names, and are free from occupying network communication bandwidth, saving network resources and reducing domain name identification cost.
In a first aspect, an embodiment of the present invention provides a domain name identification method, including:
determining sub-domain name data in domain name data to be identified;
acquiring at least one sub domain name feature of the sub domain name data, and matching each sub domain name feature with a target feature condition to obtain a sub domain name feature matching result; the target characteristic conditions are determined according to the subdomain name level threshold value and characteristic structure data;
and under the condition that the matching result of the features of all the subdomains is successful, determining the domain name data to be identified as target domain name data.
In a second aspect, an embodiment of the present invention further provides a domain name identifying apparatus, including:
The sub domain name determining module is used for determining sub domain name data in the domain name data to be identified;
the feature matching module is used for acquiring at least one sub-domain name feature of the sub-domain name data, and matching each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result; the target characteristic conditions are determined according to the subdomain name level threshold value and characteristic structure data;
and the first determining module is used for determining the domain name data to be identified as target domain name data under the condition that the characteristic matching result of each subdomain name is determined to be successful in matching.
In a third aspect, an embodiment of the present invention further provides a computer apparatus, including:
one or more processors;
a storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the domain name identification method provided by any embodiment of the present invention.
In a fourth aspect, the embodiment of the present invention further provides a computer storage medium, where a computer program is stored, where the program when executed by a processor implements the domain name identification method provided in any embodiment of the present invention.
According to the method and the device, the sub domain name data of the sub domain name data are determined in the domain name data to be identified, at least one sub domain name feature of the sub domain name data is obtained, the sub domain name features are matched with the target feature conditions according to the target feature conditions determined by the sub domain name level threshold and the feature structure data, and the sub domain name feature matching result is obtained, so that the domain name data to be identified is determined to be the target domain name data under the condition that the sub domain name feature matching result is determined to be successful in matching, the domain name is identified based on the domain name features, the problems that the domain name identification efficiency is low and network resources are occupied due to the fact that the domain name is accessed and the IP address of the domain name are depended in the prior art are avoided, the domain name is identified rapidly and in batches, network communication bandwidth is not occupied, network resources are saved, and the domain name identification cost is reduced.
Drawings
Fig. 1 is a flowchart of a domain name recognition method according to a first embodiment of the present invention.
Fig. 2 is a flowchart of a domain name recognition method according to a second embodiment of the present invention.
Fig. 3 is a schematic flow chart of CDN domain name identification according to a second embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a domain name recognition device according to a third embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof.
It should be further noted that, for convenience of description, only some, but not all of the matters related to the present invention are shown in the accompanying drawings. Before discussing exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently, or at the same time. Furthermore, the order of the operations may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example 1
Fig. 1 is a flowchart of a domain name recognition method according to an embodiment of the present invention, where the present embodiment is applicable to a case of recognizing whether any domain name is a specific type of domain name, the method may be performed by a domain name recognition device according to an embodiment of the present invention, and the device may be implemented by software and/or hardware, and may be generally integrated in a computer device. Accordingly, as shown in fig. 1, the method includes the following operations:
S110, determining sub domain name data in the domain name data to be identified.
The domain name data to be identified may be domain name content data that needs to identify whether the domain name is a specific type of domain name. The sub-domain name data may be domain name content data of sub-domain names in the domain name data to be identified.
Accordingly, the domain name data to be identified may generally include content data of each domain name structure, and the domain name structure may generally include a root domain name and a main domain name, and may further extend a sub domain name based on the main domain name. Thus, the sub-domain name data portion thereof can be determined in the domain name data to be identified.
S120, at least one sub domain name feature of the sub domain name data is obtained, and each sub domain name feature is matched with a target feature condition, so that a sub domain name feature matching result is obtained.
The target characteristic conditions are determined according to the subdomain name level threshold value and characteristic structure data.
Specifically, the subdomain name feature may be a feature of any dimension that the subdomain name has, and may include, but is not limited to, a feature that may be matched with the target feature condition. The target feature condition may be a specific condition that the feature of the subdomain name of the specific type of domain name to be identified satisfies. The subdomain name level threshold may be an extremum of the number of subdomain name levels included in a particular type of domain name that needs to be identified. The feature structure data may be content data of a specific structure included in a subdomain name in a specific type of domain name to be identified.
Accordingly, the target feature condition can be predetermined according to the specific type of domain name to be identified and the feature of the sub domain name in the type of domain name. The characteristics of the sub-domain names in the specific type domain name can include characteristics of the number dimension of the sub-domain name levels, and then the sub-domain name level threshold value can be determined according to the extreme value of the number of the sub-domain name levels included in the specific type domain name, so that the target characteristic condition is determined according to the sub-domain name level threshold value and used for matching the domain name data to be identified, of which the number of the sub-domain name levels is within the range limited by the sub-domain name level threshold value. The features of the sub-domain name in the specific type domain name may further include features of content data included in the sub-domain name, and then content of a specific structure included in the sub-domain name in the specific type domain name may be used as feature structure data, so that a target feature condition is determined according to the feature structure data, and the target feature condition is used for matching out domain name data to be identified including the feature structure data in the content of the sub-domain name.
Further, at least one sub-domain name feature can be obtained from the obtained sub-domain name data of the domain name data to be identified, and each sub-domain name feature is matched with a predetermined target feature condition to obtain a sub-domain name feature matching result, so that the sub-domain name feature matching result can describe whether the sub-domain name feature of the sub-domain name data in the domain name data to be identified meets the target feature condition, and whether the domain name data to be identified is the content data of the specific type of domain name to be identified can be determined according to the sub-domain name feature matching result.
And S130, under the condition that the characteristic matching result of each subdomain name is determined to be successful in matching, determining the domain name data to be identified as target domain name data.
Wherein, successful matching may indicate that the sub-domain name feature satisfies the description of the target feature condition. The target domain name data may be content data of a specific type of domain name that needs to be detected.
Correspondingly, the matching results of all the sub-domain names of the domain name data to be identified are successful in matching, and it can be explained that all the sub-domain name features of the sub-domain name data meet the description of the corresponding target feature conditions, then it can be determined that the domain name data to be identified has the specific features of the target domain name data, and it can be determined that the domain name data to be identified is the target domain name data.
The embodiment of the invention provides a domain name identification method, which is characterized in that sub domain name data of a domain name to be identified is determined, at least one sub domain name characteristic of the sub domain name data is obtained, and each sub domain name characteristic is matched with a target characteristic condition according to a sub domain name level threshold value and a target characteristic condition determined by characteristic structure data to obtain a sub domain name characteristic matching result, so that the domain name data to be identified is determined to be the target domain name data under the condition that the sub domain name characteristic matching result is determined to be successful in matching, the domain name identification based on the domain name characteristic is realized, the problems of low domain name identification efficiency and large network resource occupation caused by accessing the domain name and an IP address thereof in the prior art are avoided, the rapid and batch domain name identification is realized, network communication bandwidth occupation is not required, network resources are saved, and the domain name identification cost is reduced.
Example two
Fig. 2 is a flowchart of a domain name recognition method according to a second embodiment of the present invention. The embodiment of the invention is embodied based on the embodiment, and in the embodiment of the invention, a specific optional implementation mode for acquiring at least one sub domain name characteristic of the sub domain name data and matching each sub domain name characteristic with a target characteristic condition to obtain a sub domain name characteristic matching result is provided.
As shown in fig. 2, the method in the embodiment of the present invention specifically includes:
s210, determining sub domain name data in the domain name data to be identified.
In an optional embodiment of the present invention, the determining the sub-domain name data in the domain name data to be identified may include: identifying terminal root domain name data in the domain name data to be identified, and determining root domain name separation data of the terminal root domain name data; determining main domain name data in the domain name data to be identified and main domain name separation data of the main domain name data according to the tail end root domain name data and the root domain name separation data; and determining the subdomain name data in the domain name data to be identified according to the main domain name data and the main domain name separation data.
The terminal root domain name data may be root domain name content data located at the extreme end of the domain name data to be identified. The primary domain name data may be content data in a structure separated from the end root domain name data by only root domain name separation data. The root domain name separation data is used for separating the tail end root domain name data from the main domain name data. The main domain name separation data is used for separating the main domain name data from the sub domain name data. Alternatively, the root domain name separation data and the main domain name separation data may be any identical or different character distinguishable from the domain name content data, for example, may be a "." symbol in the domain name.
Accordingly, the root domain name in the structure of the extreme end of the domain name data to be identified can be determined as the root domain name of the domain name to be identified, so that the extreme end root domain name data can be identified accordingly. The separation data at the adjacent position of the terminal root domain name can be determined as root domain name separation data, which is used for showing the root domain name structure of the terminal of the domain name data to be identified and distinguishing the terminal root domain name data from the main domain name data. Accordingly, the main domain name data separated from the end root domain name data by the root domain name separation data can be determined from the end root domain name data and the root domain name separation data.
Further, the root domain name separation data and the main domain name separation data can jointly represent a main domain name structure of the domain name data to be identified, the main domain name data can be adjacent to the root domain name separation data and the main domain name separation data respectively, and then the main domain name separation data can be determined according to the determined main domain name data and the root domain name separation data. Accordingly, sub-domain name data separated from the main domain name data by the main domain name separation data can be determined further from the main domain name data and the main domain name separation data.
S220, at least one sub domain name feature of the sub domain name data is obtained, and each sub domain name feature is matched with a target feature condition, so that a sub domain name feature matching result is obtained.
The target characteristic conditions are determined according to the subdomain name level threshold value and characteristic structure data.
In an alternative embodiment of the present invention, S220 may specifically include:
s221, acquiring the level quantity characteristics of the sub domain name data.
Wherein the level number feature may be a feature describing the level number of the sub domain name data.
Correspondingly, the domain name data to be identified can comprise one-level or multi-level sub-domain names, and as the target feature condition can be determined according to the sub-domain name level threshold value, the level number feature of the sub-domain name data can be obtained according to the sub-domain name level number of the domain name data to be identified, so that a sub-domain name feature matching result can be obtained according to the matching degree between the level number feature and the description of the target feature condition.
In an optional embodiment of the present invention, the acquiring the level number feature of the sub-domain name data may include: identifying level separation data in the sub-domain name data; in the case that the level separation data in the sub domain name data is determined to be greater than or equal to a separation number threshold, determining that the level number is characterized by the level number being greater than or equal to the sub domain name level threshold.
The level separation data may be any character that can be distinguished from the content data of the adjacent two levels of sub-domain names, and may be used to separate the sub-domain name data of adjacent different levels. The separation number threshold may be predetermined according to a sub-domain name level threshold, and may be a level separation data number in the sub-domain name data when the sub-domain name level is the sub-domain name level threshold.
Accordingly, the sub domain name content data of each level in the sub domain name data can be separated by the level separation data, the number of the level separation data can be positively correlated with the number of sub domain name levels, for example, the number of the sub domain name levels can be generally equal to the number of the level separation data plus one, the level separation data can be identified in the sub domain name data, and when the level separation data is determined to be greater than or equal to the separation number threshold value, the number of the levels of the sub domain name data divided by the number of the level separation data can be determined to be greater than or equal to the sub domain name level threshold value.
S222, acquiring at least one level content data of the sub domain name data under the condition that the level number of the sub domain name data is larger than or equal to the sub domain name level threshold according to the level number characteristics.
Wherein the level content data may be content data of any level of sub domain name.
Correspondingly, the characteristics of the sub-domain name in the target domain name data can include that the number of sub-domain name levels is a plurality of, in order to judge whether the domain name to be identified is the target domain name data, a minimum value of the number of sub-domain name levels of the target domain name data can be determined to be a domain name level threshold value, and the target characteristic condition can include that the number of the sub-domain name data levels is greater than or equal to the sub-domain name level threshold value, and then part of the domain name data to be identified, of which the number of the sub-domain name levels is greater than or equal to the sub-domain name level threshold value, can be screened out according to the number of the sub-domain name data levels.
Alternatively, the target domain name data may be a CDN domain name. Specifically, if a domain name with more than one sub domain name level exists in the access domain name provided by the CDN, the sub domain name level threshold may be determined as a minimum value 2 of the number of levels that may occur, and the domain name data to be identified with more than one sub domain name level may be screened out according to the sub domain name level threshold, so that the CDN domain name may be further identified in the portion of domain name data to be identified.
In the screening of the abnormal domain name, the CDN domain name is easily identified as the abnormal domain name due to the multi-level sub domain name, and the domain name to be identified may be the abnormal domain name, and the target domain name data may include the CDN domain name that needs to be identified in the abnormal domain name, so as to further lock the illegal domain name in other domain names except the CDN domain name in the abnormal domain name. Therefore, in order to distinguish the CDN domain name from the abnormal domain name, the minimum value of the number of sub-domain name levels of the CDN domain name may be taken as a domain name level threshold, and the determining target feature condition may include that the number of levels of the sub-domain name data is greater than or equal to the sub-domain name level threshold, then the to-be-identified domain name data suspected of the CDN domain name having the excessive number of sub-domain name levels may be screened out according to the number of levels of the to-be-identified domain name data, so as to further screen the part of to-be-identified domain name data.
In an optional embodiment of the invention, the acquiring the at least one level of content data of the sub domain name data may include: acquiring the level content data of the terminal level of the subdomain name data; and repeatedly executing the steps of acquiring the level content data of the previous level under the condition that the level content data is determined not to include the feature structure data, until the level content data is determined to include the feature structure data, or the level content data is determined to not include the feature structure data.
The end level may be the level at the end of each level of the sub domain name.
Accordingly, in the case where it is determined that the number of levels of the domain name data to be recognized is greater than or equal to the sub domain name level threshold, it may be explained that the sub domain name data in the domain name data to be recognized includes a plurality of levels of content data, then the level content data of the end level of the sub domain name data may be acquired, and it may be determined whether or not the feature structure data is included therein. Specifically, any available method may be used to determine whether the level content data includes feature structure data, for example, a fuzzy matching method may be used, which is not limited herein.
Further, in the case where it is determined that the feature structure data is included in the level content data of the end level, the level content data of the previous position thereof may not be continuously acquired. If it is determined that the feature structure data is not included in the level content data of the end level, the level content data of the previous level may be continuously acquired, and if it is determined that the feature structure data is not included in the level content data of the previous level, the level content data of the previous level may be continuously acquired until it is determined that the feature structure data is included in the acquired level content data, and then the level content data of the previous level may not be continuously acquired; or until all levels of level content data in the sub domain name data are acquired and it is determined that no feature structure data is included in each level of content data.
And S223, under the condition that any one of the level content data comprises the characteristic structure data, determining that the sub-domain name characteristic matching result is successful.
Accordingly, the characteristics of the sub domain name in the target domain name data may further include content data including a specific structure in the sub domain name data. Therefore, in order to further identify the to-be-identified domain name data with the number of levels of the sub-domain name data being greater than or equal to the sub-domain name level threshold, the content data including the specific structure in the sub-domain name data of the target domain name data can be determined to be the feature structure data, and the target feature condition can be determined to include the feature structure data in the sub-domain name data, so that the to-be-identified domain name data including the feature structure data in the sub-domain name data can be further screened out from the to-be-identified domain name data with the number of levels of the sub-domain name data being greater than or equal to the sub-domain name level threshold, and matching of the respective features of the number of levels of the sub-domain names of the to-be-identified domain name data and the level content data through the target feature condition is achieved, and a sub-domain name feature matching result corresponding to the to-be-identified domain name data is obtained.
Alternatively, the feature structure data that may be included in the sub-domain name data of the CDN domain name may be root domain name structure content data. In particular, the root domain name structure content data may be content data that is typically included in a root domain name, and in a CDN domain name, there are domain names that are combined by two conventional domain names through separate data connections.
By way of example, when CDN domain name data "apkselfdl.ivo.com.cn.wsglb0.com" is made up of the domain names "apkselfdl.ivo.com.cn" and "wsglb0.com", the "symbol" identifies that the root domain name data located at the end is "com" and the main domain name data is "wsglb0", then it has four levels of sub-domain names, and the sub-domain name data end is root domain name structure content data "com.cn", then in the case that the root domain name structure content data is identified in any sub-domain name level from the end to the front of the domain name data to be identified including the multi-level sub-domain name, it can be determined that the sub-domain name level is the root domain name of the conventional domain name located at the front of the two conventional domain names making up the CDN domain name, and accordingly the obtained sub-domain name feature matching result is the matching result, that is the domain name to be used to determine that the domain name to be identified as the CDN domain name. And when the CDN domain name needs to be identified in the abnormal domain name, the sub domain name of the illegal domain name with the multi-level sub domain name does not comprise the characteristic structure data, so that the CDN domain name and the illegal domain name can be distinguished in the abnormal domain name.
In an optional embodiment of the present invention, after the obtaining at least one sub domain name feature of the sub domain name data and matching each sub domain name feature with a target feature condition, obtaining a sub domain name feature matching result, the method may further include: under the condition that the number of levels is smaller than the threshold value of the sub-domain name levels, determining that the matching result of the sub-domain name features is a matching failure, and acquiring main domain name data in the domain name data to be identified; and under the condition that the main domain name data is determined to have the target main domain name characteristics, determining the domain name data to be identified as the target domain name data.
The target main domain name feature may be a specific feature of main domain name data of specific type of target domain name data to be identified.
Accordingly, the characteristics of the target domain name data may also include the main domain name data whose main domain name data typically has a particular target main domain name characteristic, as distinguished from the main domain name data of any other type of domain name. Therefore, if the number of levels of the domain name data to be identified is determined to be smaller than the threshold value of the sub domain name levels, it can be stated that the sub domain name of the domain name data to be identified does not have the characteristics of the sub domain name of the target domain name data, the main domain name data can be obtained, so that the main domain name data is identified as the target domain name data under the condition that the main domain name data is determined to have the characteristics of the target main domain name.
In an optional embodiment of the invention, the determining that the primary domain name data has a target primary domain name feature may include: acquiring a preset number of target position content characters in the main domain name data according to the characteristic content data; and under the condition that the target position content character is determined to belong to the characteristic content data, determining that the main domain name data has the target main domain name characteristic.
Wherein the characteristic content data may be specific content included in a specific location of the main domain name data of the specific type of target domain name data to be identified. The preset number may be a number determined according to the number of characters of the feature content data. The target location content character may be a preset number of characters acquired at a specific location in the main domain name data of the data to be recognized where the characteristic content data may appear.
Accordingly, the target primary domain name feature of the target domain name data may be that the feature content data is included in a specific location of its primary domain name data. Therefore, the preset number may be determined according to the feature content data, alternatively, the preset number may be the maximum value of the number of characters of the feature content data, and the position where the feature content data is located is determined, so that the content characters of the target position of the preset number may be obtained from the main domain name data of the domain name data to be identified. Further, if the target position content character is any characteristic content data, it may be stated that the domain name data to be identified has a specific characteristic of the target domain name data, and it may be determined that the domain name data to be identified is the target domain name data.
Alternatively, the CDN domain name may include a domain name whose main domain name data is composed of conventional main domain name data combined with feature content data "CDN" or "dns" as end characters, and the preset number of target location content characters may include the last three characters of the main domain name data.
Illustratively, the main domain name data of CDN domain name data "idv1 pcm.qininuds.com" is "qininuds". The last three number of characters of the main domain name data in the domain name data to be identified can be obtained as target location content characters, and the domain name data to be identified is determined to be CDN domain name data if the target location content characters are determined to be "CDN" or "dns". And when the CDN domain name needs to be identified in the abnormal domain name, the illegal domain name with the main domain name data being a random character string does not have the target main domain name characteristic, so that the CDN domain name and the illegal domain name can be distinguished in the abnormal domain name.
And S230, under the condition that the characteristic matching result of each subdomain name is determined to be successful in matching, determining the domain name data to be identified as target domain name data.
Fig. 3 is a schematic flow chart of CDN domain name identification according to a second embodiment of the present invention. In a specific example, as shown in fig. 3, whether any domain name data to be identified is a CDN domain name is identified, whether each domain name to be identified has at most one level of sub domain name may be determined in the obtained original data set of domain names to be identified, if so, the main domain name may be extracted, and whether the main domain name ends with a "CDN" or "dns" character may be determined, if the determination result continues to be yes, the main domain name may be determined to be a CDN domain name. If any domain name to be identified does not have at most one level of sub domain name, whether the domain name to be identified is formed by combining two conventional domain names or not can be further judged by judging whether the sub domain name includes root domain name content, and if so, the domain name to be identified can be determined to be a CDN domain name.
The embodiment of the invention provides a domain name identification method, which is characterized in that sub domain name data of a domain name to be identified is determined, at least one sub domain name characteristic of the sub domain name data is obtained, and each sub domain name characteristic is matched with a target characteristic condition according to a sub domain name level threshold value and a target characteristic condition determined by characteristic structure data to obtain a sub domain name characteristic matching result, so that the domain name data to be identified is determined to be the target domain name data under the condition that the sub domain name characteristic matching result is determined to be successful in matching, the domain name identification based on the domain name characteristic is realized, the problems of low domain name identification efficiency and large network resource occupation caused by accessing the domain name and an IP address thereof in the prior art are avoided, the rapid and batch domain name identification is realized, network communication bandwidth occupation is not required, network resources are saved, and the domain name identification cost is reduced.
Example III
Fig. 4 is a schematic structural diagram of a domain name recognition device according to a third embodiment of the present invention, as shown in fig. 4, where the device includes: a sub domain name determination module 310, a feature matching module 320, and a first determination module 330.
The subdomain name determining module 310 is configured to determine subdomain name data in the domain name data to be identified.
The feature matching module 320 is configured to obtain at least one sub-domain name feature of the sub-domain name data, and match each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result; the target characteristic conditions are determined according to the subdomain name level threshold value and characteristic structure data.
The first determining module 330 is configured to determine the domain name data to be identified as target domain name data if it is determined that the matching result of the features of the sub-domain names is successful.
In an alternative implementation manner of the embodiment of the present invention, the feature matching module 320 may include: the level number feature acquisition sub-module is used for acquiring the level number features of the subdomain name data; a level content data obtaining sub-module, configured to obtain at least one level content data of the sub-domain name data when it is determined according to the level number feature that the level number of the sub-domain name data is greater than or equal to the sub-domain name level threshold; and the sub-domain name feature matching sub-module is used for determining that the sub-domain name feature matching result is successful in matching under the condition that any one of the level content data comprises the feature structure data.
In an optional implementation manner of the embodiment of the present invention, the level number feature obtaining sub-module may specifically be used for: identifying level separation data in the sub-domain name data; determining that the number of levels is characterized by the number of levels being greater than or equal to the subdomain name level threshold, if the level separation data in the subdomain name data is determined to be greater than or equal to a separation number threshold; wherein the separation number threshold is predetermined according to the subdomain name level threshold.
In an alternative implementation manner of the embodiment of the present invention, the level content data obtaining sub-module may specifically be used for: acquiring the level content data of the terminal level of the subdomain name data; and repeatedly executing the steps of acquiring the level content data of the previous level under the condition that the level content data is determined not to include the feature structure data, until the level content data is determined to include the feature structure data, or the level content data is determined to not include the feature structure data.
In an optional implementation manner of the embodiment of the present invention, the apparatus may further include: the main domain name determining module is used for determining that the sub domain name characteristic matching result is a matching failure and acquiring main domain name data in the domain name data to be identified under the condition that the level number is smaller than the sub domain name level threshold; and the second determining module is used for determining that the domain name data to be identified is the target domain name data under the condition that the main domain name data is determined to have the target main domain name characteristics.
In an optional implementation manner of the embodiment of the present invention, the second determining module may specifically be configured to: acquiring a preset number of target position content characters in the main domain name data according to the characteristic content data; and under the condition that the target position content character is determined to belong to the characteristic content data, determining that the main domain name data has the target main domain name characteristic.
In an alternative implementation manner of the embodiment of the present invention, the subfield name determining module 310 may specifically be configured to: identifying terminal root domain name data in the domain name data to be identified, and determining root domain name separation data of the terminal root domain name data; determining main domain name data in the domain name data to be identified and main domain name separation data of the main domain name data according to the tail end root domain name data and the root domain name separation data; the root domain name separation data is used for separating the tail end root domain name data from the main domain name data; determining the subdomain name data in the domain name data to be identified according to the main domain name data and the main domain name separation data; the main domain name separation data is used for separating the main domain name data from the sub domain name data.
The device can execute the domain name identification method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the method.
The embodiment of the invention provides a domain name recognition device, which is characterized in that sub domain name data of the domain name to be recognized is determined, at least one sub domain name characteristic of the sub domain name data is obtained, and each sub domain name characteristic is matched with a target characteristic condition according to a sub domain name level threshold value and the target characteristic condition determined by characteristic structure data to obtain a sub domain name characteristic matching result, so that the domain name data to be recognized is determined to be the target domain name data under the condition that the sub domain name characteristic matching result is determined to be successful in matching, the domain name recognition based on the domain name characteristic is realized, the problems of low domain name recognition efficiency and large network resource occupation caused by accessing the domain name and an IP address of the domain name are avoided, the rapid and batch domain name recognition is realized, the network communication bandwidth is not occupied, the network resource is saved, and the domain name recognition cost is reduced.
Example IV
Fig. 5 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. Fig. 5 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 5 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in FIG. 5, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors 16, a memory 28, a bus 18 that connects the various system components, including the memory 28 and the processor 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard disk drive"). Although not shown in fig. 5, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown in fig. 5, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processor 16 executes programs stored in the memory 28 to perform various functional applications and data processing, implementing the domain name recognition method provided by the embodiment of the present invention: determining sub-domain name data in domain name data to be identified; acquiring at least one sub domain name feature of the sub domain name data, and matching each sub domain name feature with a target feature condition to obtain a sub domain name feature matching result; the target characteristic conditions are determined according to the subdomain name level threshold value and characteristic structure data; and under the condition that the matching result of the features of all the subdomains is successful, determining the domain name data to be identified as target domain name data.
Example five
A fifth embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the domain name identification method provided by the embodiment of the present invention: determining sub-domain name data in domain name data to be identified; acquiring at least one sub domain name feature of the sub domain name data, and matching each sub domain name feature with a target feature condition to obtain a sub domain name feature matching result; the target characteristic conditions are determined according to the subdomain name level threshold value and characteristic structure data; and under the condition that the matching result of the features of all the subdomains is successful, determining the domain name data to be identified as target domain name data.
Any combination of one or more computer readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or computer device. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (9)

1. A domain name identification method, comprising:
determining sub-domain name data in domain name data to be identified;
acquiring at least one sub domain name feature of the sub domain name data, and matching each sub domain name feature with a target feature condition to obtain a sub domain name feature matching result; the target characteristic conditions are determined according to the subdomain name level threshold value and characteristic structure data;
under the condition that the matching result of the characteristics of all the subdomains is successful, determining the domain name data to be identified as target domain name data;
the obtaining at least one sub domain name feature of the sub domain name data, and matching each sub domain name feature with a target feature condition to obtain a sub domain name feature matching result, including:
acquiring the level number characteristics of the subdomain name data;
acquiring at least one level content data of the sub domain name data under the condition that the level number of the sub domain name data is determined to be greater than or equal to the sub domain name level threshold according to the level number characteristics;
and under the condition that any one of the level content data comprises the characteristic structure data, determining that the matching result of the subdomain name characteristics is successful.
2. The method of claim 1, wherein the obtaining the level number characteristic of the sub-domain name data comprises:
identifying level separation data in the sub-domain name data;
determining that the number of levels is characterized by the number of levels being greater than or equal to the subdomain name level threshold, if the level separation data in the subdomain name data is determined to be greater than or equal to a separation number threshold; wherein the separation number threshold is predetermined according to the subdomain name level threshold.
3. The method of claim 1, wherein the obtaining at least one level of content data of the sub domain name data comprises:
acquiring the level content data of the terminal level of the subdomain name data;
and repeatedly executing the steps of acquiring the level content data of the previous level under the condition that the level content data is determined not to include the feature structure data, until the level content data is determined to include the feature structure data, or the level content data is determined to not include the feature structure data.
4. The method according to claim 1, wherein after said obtaining at least one sub-domain name feature of said sub-domain name data and matching each of said sub-domain name features with a target feature condition, obtaining a sub-domain name feature matching result, further comprising:
Under the condition that the number of levels is smaller than the threshold value of the sub-domain name levels, determining that the matching result of the sub-domain name features is a matching failure, and acquiring main domain name data in the domain name data to be identified;
and under the condition that the main domain name data is determined to have the target main domain name characteristics, determining the domain name data to be identified as the target domain name data.
5. The method of claim 4, wherein the determining that the primary domain name data has a target primary domain name characteristic comprises:
acquiring a preset number of target position content characters in the main domain name data according to the characteristic content data;
and under the condition that the target position content character is determined to belong to the characteristic content data, determining that the main domain name data has the target main domain name characteristic.
6. The method according to claim 1, wherein the determining sub-domain name data in the domain name data to be identified comprises:
identifying terminal root domain name data in the domain name data to be identified, and determining root domain name separation data of the terminal root domain name data;
determining main domain name data in the domain name data to be identified and main domain name separation data of the main domain name data according to the tail end root domain name data and the root domain name separation data; the root domain name separation data is used for separating the tail end root domain name data from the main domain name data;
Determining the subdomain name data in the domain name data to be identified according to the main domain name data and the main domain name separation data; the main domain name separation data is used for separating the main domain name data from the sub domain name data.
7. A domain name recognition device, comprising:
the sub domain name determining module is used for determining sub domain name data in the domain name data to be identified;
the feature matching module is used for acquiring at least one sub-domain name feature of the sub-domain name data, and matching each sub-domain name feature with a target feature condition to obtain a sub-domain name feature matching result; the target characteristic conditions are determined according to the subdomain name level threshold value and characteristic structure data;
the first determining module is used for determining that the domain name data to be identified is target domain name data under the condition that the characteristic matching result of each subdomain name is determined to be successful in matching;
the feature matching module further includes:
the level number feature acquisition sub-module is used for acquiring the level number features of the subdomain name data;
a level content data obtaining sub-module, configured to obtain at least one level content data of the sub-domain name data when it is determined according to the level number feature that the level number of the sub-domain name data is greater than or equal to the sub-domain name level threshold;
And the sub-domain name feature matching sub-module is used for determining that the sub-domain name feature matching result is successful in matching under the condition that any one of the level content data comprises the feature structure data.
8. A computer device, the computer device comprising:
one or more processors;
a storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the domain name identification method of any of claims 1-6.
9. A computer storage medium having stored thereon a computer program, which when executed by a processor implements a domain name identification method as claimed in any of claims 1 to 6.
CN202111672272.2A 2021-12-31 2021-12-31 Domain name identification method, device, equipment and storage medium Active CN114363290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111672272.2A CN114363290B (en) 2021-12-31 2021-12-31 Domain name identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111672272.2A CN114363290B (en) 2021-12-31 2021-12-31 Domain name identification method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114363290A CN114363290A (en) 2022-04-15
CN114363290B true CN114363290B (en) 2023-08-29

Family

ID=81104856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111672272.2A Active CN114363290B (en) 2021-12-31 2021-12-31 Domain name identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114363290B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115361358B (en) * 2022-08-19 2024-02-06 山石网科通信技术股份有限公司 IP extraction method and device, storage medium and electronic device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2555801A (en) * 2016-11-09 2018-05-16 F Secure Corp Identifying fraudulent and malicious websites, domain and subdomain names
CN109450886A (en) * 2018-10-30 2019-03-08 杭州安恒信息技术股份有限公司 A kind of domain name recognition methods, system and electronic equipment and storage medium
CN110008705A (en) * 2019-04-15 2019-07-12 北京微步在线科技有限公司 A kind of recognition methods of malice domain name, device and electronic equipment based on deep learning
CN110674370A (en) * 2019-09-23 2020-01-10 鹏城实验室 Domain name identification method and device, storage medium and electronic equipment
CN111800404A (en) * 2020-06-29 2020-10-20 深信服科技股份有限公司 Method and device for identifying malicious domain name and storage medium
CN111818198A (en) * 2020-09-10 2020-10-23 腾讯科技(深圳)有限公司 Domain name detection method, domain name detection device, equipment and medium
CN113329035A (en) * 2021-06-29 2021-08-31 深信服科技股份有限公司 Method and device for detecting attack domain name, electronic equipment and storage medium
CN113691489A (en) * 2020-05-19 2021-11-23 北京观成科技有限公司 Malicious domain name detection feature processing method and device and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150142608A1 (en) * 2013-11-18 2015-05-21 Andrew Horn System and method for identifying domain names

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2555801A (en) * 2016-11-09 2018-05-16 F Secure Corp Identifying fraudulent and malicious websites, domain and subdomain names
CN109450886A (en) * 2018-10-30 2019-03-08 杭州安恒信息技术股份有限公司 A kind of domain name recognition methods, system and electronic equipment and storage medium
CN110008705A (en) * 2019-04-15 2019-07-12 北京微步在线科技有限公司 A kind of recognition methods of malice domain name, device and electronic equipment based on deep learning
CN110674370A (en) * 2019-09-23 2020-01-10 鹏城实验室 Domain name identification method and device, storage medium and electronic equipment
CN113691489A (en) * 2020-05-19 2021-11-23 北京观成科技有限公司 Malicious domain name detection feature processing method and device and electronic equipment
CN111800404A (en) * 2020-06-29 2020-10-20 深信服科技股份有限公司 Method and device for identifying malicious domain name and storage medium
CN111818198A (en) * 2020-09-10 2020-10-23 腾讯科技(深圳)有限公司 Domain name detection method, domain name detection device, equipment and medium
CN113329035A (en) * 2021-06-29 2021-08-31 深信服科技股份有限公司 Method and device for detecting attack domain name, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于大数据分析的恶意域名检测技术研究与实现;殷聪贤;《中国优秀硕士学位论文全文数据库(信息科技辑)》;全文 *

Also Published As

Publication number Publication date
CN114363290A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
US8364465B2 (en) Optimizing a language/media translation map
CN110445688B (en) Interface service function monitoring method and system based on data collection
CN112738102B (en) Asset identification method, device, equipment and storage medium
CN114363290B (en) Domain name identification method, device, equipment and storage medium
CN113139025B (en) Threat information evaluation method, device, equipment and storage medium
CN112769802B (en) Access verification method and device based on server, electronic equipment and storage medium
CN111400695B (en) Equipment fingerprint generation method, device, equipment and medium
CN110826036A (en) User operation behavior safety identification method and device and electronic equipment
CN110888791A (en) Log processing method, device, equipment and storage medium
CN111414263A (en) Information processing method, device, server and storage medium
CN112685255A (en) Interface monitoring method and device, electronic equipment and storage medium
CN112214770A (en) Malicious sample identification method and device, computing equipment and medium
CN109992960B (en) Counterfeit parameter detection method and device, electronic equipment and storage medium
CN115296895B (en) Request response method and device, storage medium and electronic equipment
CN114116811B (en) Log processing method, device, equipment and storage medium
CN103034854A (en) Image processing device and image processing method
CN113485835B (en) Method, system, equipment and medium for realizing memory sharing under multiple scenes
CN115643044A (en) Data processing method, device, server and storage medium
CN112866005B (en) Method, device and equipment for processing user access log and storage medium
CN110597724A (en) Calling method and device of application security test component, server and storage medium
CN111461873A (en) Verification method, device, server and storage medium of fund plan
CN109902176B (en) Data association expansion method and non-transitory computer instruction storage medium
CN113609352B (en) Character string retrieval method, device, computer equipment and storage medium
CN115022011A (en) Method, device, equipment and medium for identifying missed scanning software access request
CN111176611B (en) Method and device for generating random data set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant