CN116760596A - Domain name category identification method and device and electronic equipment - Google Patents

Domain name category identification method and device and electronic equipment Download PDF

Info

Publication number
CN116760596A
CN116760596A CN202310714247.9A CN202310714247A CN116760596A CN 116760596 A CN116760596 A CN 116760596A CN 202310714247 A CN202310714247 A CN 202310714247A CN 116760596 A CN116760596 A CN 116760596A
Authority
CN
China
Prior art keywords
domain name
target
result
category
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310714247.9A
Other languages
Chinese (zh)
Inventor
张婷
杨升
蒋宇轩
申勇
邱鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hillstone Networks Information Technology Co ltd
Original Assignee
Beijing Hillstone Networks Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Hillstone Networks Information Technology Co ltd filed Critical Beijing Hillstone Networks Information Technology Co ltd
Priority to CN202310714247.9A priority Critical patent/CN116760596A/en
Publication of CN116760596A publication Critical patent/CN116760596A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a domain name category identification method and device and electronic equipment. Wherein the method comprises the following steps: acquiring a Domain Name System (DNS) message, wherein the DNS message carries a target domain name; matching the target domain name with the first domain name to obtain a first domain name matching result; under the condition that the first domain name matching result is that the target domain name is not matched with the first domain name, matching the target domain name with the second domain name to obtain a second domain name matching result; extracting domain name characteristics corresponding to the target domain name under the condition that the second domain name matching result is that the target domain name is not matched with the second domain name; and determining whether the target domain name is a target domain name category identification result of the domain generation algorithm DGA domain name according to the domain name characteristics. The invention solves the technical problem of low recognition accuracy when recognizing the DGA domain name in the related technology.

Description

Domain name category identification method and device and electronic equipment
Technical Field
The invention relates to the field of domain name systems, in particular to a domain name category identification method, a domain name category identification device and electronic equipment.
Background
DNS (Domain name System ) is one of the most critical basic services of the internet, and associates a domain name with an IP address, and when a user needs to access a specific domain name, the domain name can be resolved through a DNS protocol (53 ports by default), so that a corresponding target IP can be quickly queried, data sent by the target IP can be received, and data can be sent to the target IP by using other protocols. However, an attacker on the network may register or control certain domain names and bind them to the IP address of the botnet server, implementing a series of network attacks. Identifying and detecting these malicious domain names may discover or block the attack when resolving domain names occurs.
The communication of domain names based on DGA (Domain Generation Algorithm ) is a common means for avoiding domain name blacklist detection for attackers in the network. DGA domain names are domains generated by inputting some seeds, including strings, numbers, words in a particular english dictionary, and dates, using encryption algorithms (such as exclusive or operations, etc.), to generate a series of pseudorandom strings. Because the DGA domain names are randomly generated, the traditional technology of detecting malicious domain names through a domain name blacklist is easy to escape, and the DGA domain name forms have great similarity with domain names which are used by some cloud service providers in a large amount for balancing loads, so that the difficulty of identifying the DGA domain names is increased. In the related art, when identifying the DGA domain name, there is still a problem that the identification accuracy is not high.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the invention provides a domain name category identification method, a domain name category identification device and electronic equipment, which are used for at least solving the technical problem of low identification accuracy when identifying a DGA domain name in the related technology.
According to an aspect of an embodiment of the present invention, there is provided a domain name category identification method, including: acquiring a Domain Name System (DNS) message, wherein the DNS message carries a target domain name; matching the target domain name with a first domain name to obtain a first domain name matching result, wherein the first domain name category confidence of the first domain name is greater than or equal to a preset confidence threshold, the first domain name category confidence is used for indicating the confidence of a first domain name category identification result of the first domain name, and the first domain name category identification result is a result of whether the first domain name is a known category domain name; under the condition that the first domain name matching result is that the target domain name is not matched with the first domain name, matching the target domain name with a second domain name to obtain a second domain name matching result, wherein the second domain name category confidence of the second domain name is smaller than the preset confidence threshold value, the second domain name category confidence represents the confidence of a second domain name category identification result of the second domain name, and the second domain name category identification result is the result of whether the second domain name is a domain generation algorithm DGA domain name; extracting domain name characteristics corresponding to the target domain name under the condition that the second domain name matching result is that the target domain name is not matched with the second domain name; and determining whether the target domain name is a target domain name category identification result of the DGA domain name according to the domain name characteristics.
Optionally, before the matching the target domain name with the second domain name to obtain the second domain name matching result, the method includes: determining domain name parameters of the target domain name; determining whether the domain name parameters meet preset conditions or not to obtain a first determination result; and under the condition that the first determination result is that the domain name parameter accords with a first preset condition, matching the target domain name with the second domain name to obtain a second domain name matching result.
Optionally, before the matching the target domain name with the second domain name to obtain the second domain name matching result, the method includes: determining an IP address for sending the DNS message; acquiring a historical DNS message sent by the IP address within a preset time range; determining a sending message index corresponding to the IP address according to the historical DNS message; determining whether the index of the sent message accords with a preset condition or not to obtain a second determination result; and under the condition that the second determination result is that the sent message index accords with a second preset condition, matching the target domain name with the second domain name to obtain a second domain name matching result.
Optionally, after determining whether the target domain name is the target domain name category identification result of the DGA domain name according to the domain name characteristics, the method further includes: and determining the target domain name category confidence of the target domain name category recognition result.
Optionally, extracting domain name characteristics corresponding to the target domain name; determining whether the target domain name is a target domain name category identification result of the DGA domain name according to the domain name characteristics comprises: inputting the target domain name into a feature extraction module of a domain name category recognition model to obtain the domain name feature, wherein the domain name detection model is obtained by training sample data, the sample data comprises a third domain name and an updated domain name category recognition result corresponding to the third domain name, and the third domain name is a domain name in which the corresponding second domain name category recognition result is inconsistent with the corresponding updated domain name category recognition result in the second domain name; and inputting the domain name characteristics to a domain name category recognition module of the domain name category recognition model to obtain the target domain name category recognition result.
Optionally, after the matching the target domain name with the second domain name to obtain a second domain name matching result, the method further includes: and under the condition that the second domain name matching result is that the target domain name is not matched with the second domain name, determining that the DNS message is a delayed sending message, and determining the delay sending time of the delayed sending message.
Optionally, after determining whether the target domain name is the target domain name category identification result of the DGA domain name according to the domain name characteristics, the method further includes: and sending alarm information to a preset terminal under the condition that the target domain name is the DGA domain name as the target domain name identification result, wherein the alarm information carries the target domain name.
According to an aspect of an embodiment of the present invention, there is provided a domain name category identifying apparatus, including: the system comprises an acquisition module, a judgment module and a judgment module, wherein the acquisition module is used for acquiring a DNS message, wherein the DNS message carries a target domain name; the first matching module is used for matching the target domain name with a first domain name to obtain a first domain name matching result, wherein the first domain name category confidence coefficient of the first domain name is larger than or equal to a preset confidence coefficient threshold value, the first domain name category confidence coefficient is used for representing the confidence coefficient of a first domain name category recognition result of the first domain name, and the first domain name category recognition result is a result of whether the first domain name is a known category domain name; the second matching module is used for matching the target domain name with a second domain name to obtain a second domain name matching result when the first domain name matching result is that the target domain name is not matched with the first domain name, wherein the second domain name category confidence of the second domain name is smaller than the preset confidence threshold value, the second domain name category confidence represents the confidence of a second domain name category recognition result of the second domain name, and the second domain name category recognition result is the result of whether the second domain name is a DGA domain name or not; the extraction module is used for extracting domain name characteristics corresponding to the target domain name under the condition that the second domain name matching result is that the target domain name is not matched with the second domain name; and the determining module is used for determining whether the target domain name is a target domain name category identification result of the DGA domain name according to the domain name characteristics.
According to an aspect of an embodiment of the present invention, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the domain name category identification method of any of the above.
According to an aspect of an embodiment of the present invention, there is provided a computer-readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform any one of the domain name category identification methods described above.
In the embodiment of the invention, a DNS message carrying a target domain name is obtained, the target domain name is matched with a first domain name, and a first domain name matching result is obtained, wherein the first domain name category confidence of the first domain name is greater than or equal to a preset confidence threshold value, the first domain name category confidence is used for indicating the confidence of a first domain name category identification result of the first domain name, and the first domain name category identification result is a result of whether the first domain name is a known category domain name or not. Therefore, whether the target domain name is a domain name with a known domain name category can be determined from the step, and when the first domain name matching result is that the target domain name is not matched with the first domain name, that is, the target domain name is not a domain name with a known domain name category, the target domain name is matched with the second domain name, so as to obtain a second domain name matching result, wherein the second domain name category confidence of the second domain name is smaller than a preset confidence threshold value, the second domain name category confidence represents the confidence of a second domain name category recognition result of the second domain name, and the second domain name category recognition result is that whether the second domain name is a DGA domain name. Thus, it can be determined from this step whether the target domain name is a recorded DGA domain name or whether it is a recorded non-DGA domain name. And extracting the domain name characteristics corresponding to the target domain name under the condition that the second domain name matching result is that the target domain name is not matched with the second domain name, namely that the target domain name is not the recorded DGA domain name or not, and determining whether the target domain name is the target domain name category identification result of the DGA domain name according to the domain name characteristics. Because in the invention, after a series of judgment, the target domain name category recognition result of the target domain name is determined, the purpose of accurately recognizing the target domain name category recognition result of the target domain name is achieved by determining whether the target domain name is the DGA domain name or not through the domain name characteristics corresponding to the target domain name under the condition that the target domain name category recognition result of the target domain name is still not judged after a series of judgment. In addition, if the target domain name category recognition result of the target domain name is determined in a series of judging processes, the accuracy of the recognized target domain name category recognition result of the target domain name is greatly improved due to comparison with the domain names with known domain name category recognition results, and the technical problem that the recognition accuracy is not high when the DGA domain name is recognized in the related art is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flow chart of a domain name category identification method according to an embodiment of the present application;
FIG. 2 is a schematic view of an application scenario provided by an alternative embodiment of the present application;
FIG. 3 is a schematic diagram of a closed-loop detection method for DGA domain names according to an alternative embodiment of the present application;
FIG. 4 is a schematic diagram of a closed-loop detection apparatus for DGA domain name according to an alternative embodiment of the present application;
fig. 5 is a block diagram of a domain name category identifying device according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, related information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by a user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.
Example 1
In accordance with an embodiment of the present application, there is provided an embodiment of a domain name category identification method, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order other than that shown or described herein.
Before the introduction step, first an introduction is made for the background of the present solution: in the domain name system DNS, a DNS packet carrying a target domain name is often received, a DNS server corresponding to the target domain name may be determined by the target domain name, and the DNS server may resolve an IP address of a host corresponding to the target domain name, so as to obtain content or a file corresponding to the IP address, and may also send data to the IP address through other protocols. However, at this time, some malicious domain names, such as DGA domain names, may exist, and when such domain names exist, an IP address corresponding to the DGA domain name may be acquired, and it is likely that a malicious virus or other malicious content directly downloaded to the IP address may be generated. Therefore, the application provides a domain name category identification method which can accurately identify the DGA domain name. The method provided by the application is described below:
Fig. 1 is a flowchart of a domain name category identification method according to an embodiment of the present application, as shown in fig. 1, the method includes the steps of:
step S102, a DNS message is obtained, wherein the DNS message carries a target domain name;
in step S102 provided in the present application, the DNS packet is a packet received by the domain name system, and is generally used to access an IP address corresponding to the target domain name.
Step S104, matching the target domain name with the first domain name to obtain a first domain name matching result, wherein the first domain name category confidence of the first domain name is greater than or equal to a preset confidence threshold value, the first domain name category confidence is used for indicating the confidence of a first domain name category identification result of the first domain name, and the first domain name category identification result is a result of whether the first domain name is a known category domain name;
in step S104 provided in the present application, the confidence of the first domain name class of the first domain name is greater than or equal to the preset confidence threshold, which can be understood that the first domain name class identification result of the first domain name is more accurate, which can be said to be basically free of errors. The preset confidence threshold value can be set in a customized manner according to actual application and scene, for example, the preset confidence threshold value is set to be 99%. Therefore, by matching the target domain name with the first domain name, the step can know whether the target domain name has an accurate domain name category recognition result. And under the condition that the first domain name matching result is that the target domain name is matched with the first domain name, the target domain name category recognition result of the target domain name is the first domain name category recognition result. The corresponding judgment can be realized by the step. The judgment is quick, and the first domain name category confidence is larger than or equal to the preset confidence threshold value, so that the judgment is very reliable, and therefore, whether the target domain name is the target domain name category recognition result of the DGA domain name is also reliable.
It should be noted that the number of the first domain names may be plural, and the first domain names may be retrieved from a trusted database, for example, in the case that the first domain names are obtained from a white list database, the first domain name category identification result of the first domain names is a normal domain name; and under the condition that the first domain name is obtained from the blacklist database, the first domain name category identification result of the first domain name is a specific abnormal domain name.
Step S106, under the condition that the first domain name matching result is that the target domain name is not matched with the first domain name, matching the target domain name with a second domain name to obtain a second domain name matching result, wherein the second domain name category confidence of the second domain name is smaller than a preset confidence threshold value, the second domain name category confidence represents the confidence of a second domain name category recognition result of the second domain name, and the second domain name category recognition result is the result of whether the second domain name is a DGA domain name or not;
in step S106 provided by the present application, when the first domain name matching result is that the target domain name does not match the first domain name, it is indicated that the target domain name is not a domain name that knows exactly what domain name category is. In this case, the target domain name and the second domain name are to be matched, and a second domain name matching result is obtained.
It should be noted that, the second domain name category identification result of the second domain name is that the second domain name is a DGA domain name, or the second domain name is not a DGA domain name. Thus by matching the target domain name with the second domain name, it can be known whether the target domain name is a DGA domain name or not.
It should be further noted that, although the second domain name category confidence of the second domain name is smaller than the preset confidence threshold, the second domain name category confidence is still accurate, so that through this step, it still can be quickly and accurately determined whether the target domain name is the target domain name category recognition result of the DGA domain name.
Step S108, extracting domain name characteristics corresponding to the target domain name under the condition that the second domain name matching result is that the target domain name is not matched with the second domain name;
in step S108 provided by the present application, when the second domain name matching result is that the target domain name does not match the second domain name, it is explained that whether the target domain name is a DGA domain name cannot be directly known from the above steps, and at this time, the domain name feature corresponding to the target domain name can be extracted. The application relates to a DGA domain name, which is generated by an algorithm, so that the domain name is regular, and therefore, whether the target domain name is the target domain name category identification result of the DGA domain name can be determined by extracting the domain name characteristics corresponding to the target domain name.
Step S110, determining whether the target domain name is the target domain name category identification result of the DGA domain name according to the domain name characteristics.
In step S110 provided by the present application, whether the target domain is the target domain name category recognition result of the DGA domain name is determined according to the domain name characteristics, so as to achieve the purpose of recognizing the target domain name category recognition result of the target domain name.
Through the steps, a DNS message carrying a target domain name is obtained, the target domain name is matched with a first domain name, and a first domain name matching result is obtained, wherein the first domain name category confidence of the first domain name is larger than or equal to a preset confidence threshold value, the first domain name category confidence is used for indicating the confidence of a first domain name category recognition result of the first domain name, and the first domain name category recognition result is a result of whether the first domain name is a known category domain name or not. Therefore, whether the target domain name is a domain name with a known domain name category can be determined from the step, and when the first domain name matching result is that the target domain name is not matched with the first domain name, that is, the target domain name is not a domain name with a known domain name category, the target domain name is matched with the second domain name, so as to obtain a second domain name matching result, wherein the second domain name category confidence of the second domain name is smaller than a preset confidence threshold value, the second domain name category confidence represents the confidence of a second domain name category recognition result of the second domain name, and the second domain name category recognition result is that whether the second domain name is a DGA domain name. Thus, it can be determined from this step whether the target domain name is a recorded DGA domain name or whether it is a recorded non-DGA domain name. And extracting the domain name characteristics corresponding to the target domain name under the condition that the second domain name matching result is that the target domain name is not matched with the second domain name, namely that the target domain name is not the recorded DGA domain name or not, and determining whether the target domain name is the target domain name category identification result of the DGA domain name according to the domain name characteristics. Because in the application, after a series of judgment, the target domain name category recognition result of the target domain name is determined, the purpose of accurately recognizing the target domain name category recognition result of the target domain name is achieved by determining whether the target domain name is the DGA domain name or not through the domain name characteristics corresponding to the target domain name under the condition that the target domain name category recognition result of the target domain name is still not judged after a series of judgment. In addition, if the target domain name category recognition result of the target domain name is determined in a series of judging processes, the accuracy of the recognized target domain name category recognition result of the target domain name is greatly improved due to comparison with the domain names with known domain name category recognition results, and the technical problem that the recognition accuracy is not high when the DGA domain name is recognized in the related art is solved.
As an alternative embodiment, matching the target domain name with the second domain name, before obtaining the second domain name matching result, includes: determining domain name parameters of a target domain name; determining whether the domain name parameters meet preset conditions or not to obtain a first determination result; and under the condition that the first determination result is that the domain name parameter accords with a first preset condition, matching the target domain name with the second domain name to obtain a second domain name matching result.
In this embodiment, whether the domain name parameter meets the preset condition may be determined directly by acquiring the domain name parameter of the target domain name, and whether the target domain name is to be continuously identified is determined according to the determined result, that is, whether the target domain name is a normal or an unnecessary identified domain name may be determined according to the determined result, that is, it may be determined that the target domain name is not a DGA domain name, and in the case that the target domain name is an abnormal or a necessary identified domain name, it may not be determined that the target domain name is not a DGA domain name, the operation of continuously matching the target domain name with the second domain name may be performed, so as to obtain the second domain name matching result.
For example, when the domain name parameter is the main body of the domain name, the determination can be made by the length of the main body, for example, when the main body length of the domain name is less than 5 and greater than 63, it is determined that the domain name is the domain name which is not necessarily recognized, because when the main body length of the domain name is too long, it can be directly determined that the target domain name is not a domain name which can find the IP address, that is, an unused domain name, and when the main body length of the domain name is too short, it is usually a normal domain name which has already been registered. The identification of the target domain name may not continue.
It should be noted that, the domain name parameters of the target domain name may also include a suffix, a beginning, a root word, and the like. In the case where the domain name parameter of the target domain name includes a suffix of the domain name, the suffix is a reverse query suffix or a specific suffix, it is determined that it is a normal domain name, and thus the target domain name may not be continuously identified. In the case that the domain name parameter of the target domain name includes a root, the root ratio is less than or equal to 0.4, and it is determined that the target domain name is not necessarily recognized, and subsequent recognition is not performed, and the like, the customized setting can be performed according to the actual application and scene.
By the method, some domain name formats which do not need to be continuously identified can be rapidly screened out, so that the identification progress of the target domain name category identification result of the target domain name can be accelerated.
As an alternative embodiment, matching the target domain name with the second domain name, before obtaining the second domain name matching result, includes: determining an IP address for sending the DNS message; acquiring a historical DNS message sent by an IP address within a preset time range; determining a sending message index corresponding to the IP address according to the historical DNS message; determining whether the index of the sent message accords with a preset condition or not to obtain a second determination result; and under the condition that the second determination result is that the index of the sent message accords with a second preset condition, matching the target domain name with the second domain name to obtain a second domain name matching result.
In this embodiment, the IP address of the sending DNS packet, that is, the IP address of the sending DNS packet, may be determined by acquiring the historical DNS packet sent by the IP address within the predetermined time range, to determine whether the DNS query behavior of the IP address is normal, if it is an abnormal IP address, then the identification is continued, and if it is a normal IP address, then the identification is not performed. Determining a sending message index corresponding to the IP address according to the historical DNS message; and determining whether the index of the transmitted message accords with a preset condition to identify whether the IP address is normal or abnormal.
It should be noted that, when determining the sending message index corresponding to the IP address according to the historical DNS message, the message index may be determined by multiple aspects of the historical DNS message, for example, the message index may refer to the number of queries within a predetermined time range, or may refer to the number of times of querying a domain name that does not exist. If the number of queries is less than 4, the identification is not performed, and the number of times of domain name queries divided by the total number of times of domain name queries is less than or equal to 20%.
By the method, normal IP addresses which do not need to be identified can be screened out quickly, so that the identification progress of the target domain name category identification result of the target domain name can be accelerated.
As an optional embodiment, after determining whether the target domain name is the target domain name category identification result of the DGA domain name according to the domain name characteristics, the method further includes: and determining the confidence of the target domain name category identification result.
In this embodiment, the target domain name category confidence coefficient of the target domain name category recognition result is determined, and when the target domain name category confidence coefficient is smaller than a preset confidence coefficient threshold value and still larger than a certain confidence coefficient threshold value, the target domain name at this time can be used as a second domain name to participate in the matching link of the method provided by the application. That is, when a target domain name is further one, when the target domain name is matched with the second domain name, the new target domain name is compared with the last target domain name already serving as the second domain name, and if the target domain names are identical, the new target domain name is identical to the last target domain name already serving as the second domain name in the target domain name category recognition result. At this time, it should be noted that the number of the second domain names is at least two, and the target domain name is compared with the second domain names so as not to be omitted.
As an optional embodiment, extracting domain name characteristics corresponding to the target domain name; determining whether the target domain name is a target domain name category identification result of the DGA domain name according to the domain name characteristics comprises the following steps: inputting the target domain name into a feature extraction module of a domain name category recognition model to obtain domain name features, wherein the domain name recognition model is obtained by training sample data, the sample data comprises a third domain name and an updated domain name category recognition result corresponding to the third domain name, the third domain name is a domain name in which the corresponding second domain name category recognition result is inconsistent with the corresponding updated domain name category recognition result in the second domain name; and inputting the domain name characteristics to a domain name category recognition module of the domain name category recognition model to obtain a target domain name category recognition result.
In this embodiment, the purpose of extracting the domain name characteristics corresponding to the target domain name and determining whether the target domain name is the target domain name category identification result of the DGA domain name according to the domain name characteristics is achieved by means of a model. By means of the model, feature extraction can be more accurate, and the process of determining whether the target domain name is the target domain name category recognition result of the DGA domain name can be greatly accelerated, so that the method is beneficial to implementation.
Moreover, it should be noted that the domain name recognition model is obtained by training sample data, the sample data includes a third domain name and an updated domain name type recognition result corresponding to the third domain name, and the third domain name is a domain name in which the corresponding second domain name type recognition result is inconsistent with the corresponding updated domain name type recognition result in the second domain name. The updated domain name category recognition result is obtained after the domain name category recognition result is obtained, and the obtained updated domain name category recognition result is more accurately checked, so that in the second domain name, if the corresponding second domain name category recognition result is inconsistent with the corresponding updated domain name category recognition result, no distinguishing features are learned in the model, and the model is used as sample data to participate in the process of training the model, so that the model can be more accurate.
It should be noted that after the domain name category recognition result is obtained, after the obtained domain name category recognition result is more accurately updated, the second domain name category recognition result corresponding to the second domain name can be updated under the condition that the corresponding second domain name category recognition result is inconsistent with the corresponding updated domain name category recognition result, and the incorrect result is changed to be correct, so that the matching result is more accurate during subsequent matching.
As an optional embodiment, after matching the target domain name with the second domain name and obtaining the second domain name matching result, the method further includes: and under the condition that the second domain name matching result is that the target domain name is not matched with the second domain name, determining that the DNS message is a delayed transmission message, and determining the delay transmission time of the delayed transmission message.
In this embodiment, a method of processing a DNS packet in the case where the second domain name matching result is that the target domain name does not match the second domain name is described. In this case, it is necessary to further determine whether the target domain name is a DGA domain name, and therefore, at this time, it is necessary to determine that the DNS packet is a delayed transmission packet, determine the delayed transmission time of the delayed transmission packet, and perform delayed transmission processing on the DNS packet, thereby ensuring that there is enough time to determine the domain name type of the target domain name, and avoiding the phenomena such as middle viruses caused by direct transmission of the DGA domain name or a malicious domain name.
As an optional embodiment, after determining whether the target domain name is the target domain name category identification result of the DGA domain name according to the domain name characteristics, the method further includes: and sending alarm information to a preset terminal under the condition that the target domain name is DGA domain name as a target domain name identification result, wherein the alarm information carries the target domain name.
In this embodiment, it is described that, when the target domain name is DGA, the alarm information carrying the target domain name is timely sent to the predetermined terminal, so that the operation and maintenance personnel using the predetermined terminal can timely process the abnormal domain name or verify the abnormal domain name, and the actual process can be set in a customized manner according to the specific application and scenario.
Based on the foregoing embodiments and optional embodiments, an optional implementation is provided, and is specifically described below.
The invention provides a method and a device for detecting a DGA domain name in a closed loop in an alternative embodiment, and the method provided by the alternative embodiment of the invention is described as follows:
the complete detection flow comprises the following steps: filtering the domain name in the DNS message by using a domain name white list and threat information (the step of matching the target domain name with the first domain name); preprocessing the domain name by using a DGA domain name preprocessing module, and filtering the domain name which is easy to generate false alarm again (a series of steps of determining the domain name parameter of the target domain name and determining the IP address of a DNS message before matching the target domain name with the second domain name are the same as the steps of determining the IP address of the DNS message); firstly searching whether a corresponding detection result local cache exists or not, and if the corresponding detection result local cache exists, processing according to the cached detection result; if no cache exists, recording the domain name as the domain name to be detected (the step of obtaining a second domain name matching result with the matching target domain name and the second domain name); recording the domain name passing through the filtering and preprocessing flow as a domain name to be detected, and carrying out delay transmission on the DNS message (the step of determining that the DNS message is a delay transmission message and determining delay transmission time of the delay transmission message under the condition that the second domain name matching result is that the target domain name is not matched with the second domain name; detecting the domain name to be detected by using an intelligent algorithm model of the DGA domain name detection module, and locally caching the detected result (the step of determining whether the target domain name is a target domain name category identification result of the DGA domain name according to the domain name characteristics, wherein the domain name characteristics correspond to the extracted target domain name); and feeding back and analyzing the detection result, wherein the misreported or missed domain name participates in the updating of a domain name white list or an intelligent algorithm model, so as to realize detection closed loop (the step of performing model training by using sample data comprising a third domain name is the same as the step of performing model training).
Wherein the DGA domain name preprocessing module comprises 2 sub-modules:
(a) A domain name preprocessing sub-module (the same part for determining whether the domain name parameter accords with the preset condition or not); (b) The DNS query statistics preprocessing sub-module (similar to the above-mentioned part for determining whether the index of the sent message meets the preset condition).
The DGA domain name detection module includes 3 sub-modules:
(a) A domain name feature extraction sub-module (domain name feature part corresponding to the extraction); (b) A DGA classification sub-module for predicting whether the domain name is DGA (determining whether the target domain name is part of the target domain name classification recognition result of the DGA domain name according to the domain name characteristics; (c) The DGA multi-classification sub-module predicts the domain name which is detected as DGA and the DGA family to which the domain name belongs.
Through the complete detection flow, the automatic updating mechanism of the domain name white list, threat information, the DGA classification intelligent algorithm model and the DGA multi-classification intelligent algorithm model and the operability of an administrator on the detection device are matched, and the DGA domain name closed-loop detection method and the DGA domain name closed-loop detection device provided by the invention can effectively detect the DGA domain name in the DNS query.
Fig. 2 is a schematic diagram of an application scenario provided in an alternative embodiment of the present invention, as shown in fig. 2, which is an example of a typical application scenario of DGA domain name detection, in a domain name system DNS, a DNS request carrying a target domain name is often received, and can be understood as a DNS packet, a DNS server corresponding to the target domain name can be determined by the target domain name, and the DNS server can resolve a host IP address corresponding to the target domain name, so as to implement step (1) (2) shown in the figure, so as to obtain content or a file corresponding to the IP address, and can also send data to the IP address by other protocols, so as to implement step (3) (4) shown in the figure. However, at this time, some malicious domain names, such as DGA domain names, may exist, and when such domain names exist, an IP address corresponding to the DGA domain name may be acquired, and it is likely that a malicious virus or other malicious content directly downloaded to the IP address may be generated.
When the client side host accesses the external network server through the domain name, the IP address of the domain name is acquired through DNS inquiry, and then the client side host side communicates with the server to which the IP address belongs. When the flow generated in the process passes through the DGA domain name detection device, the detection device can detect the DGA domain name of the domain name in the DNS query process.
When a client accesses a malicious DGA domain name, the flow initiated by the client is approximately shown in the figure, the DGA domain name detection device can finish DGA detection of the domain name in the process (2), if the DGA domain name is determined to be the DGA domain name, the detection device can generate an alarm log or directly block a DNS response message according to a processing mode designated by an administrator, so that the aim of discovering or blocking subsequent attack behaviors is fulfilled.
Based on the above scenario, the optional embodiment of the present invention provides a method and apparatus for detecting a DGA domain name in a closed loop, which can accurately identify the DGA domain name.
The following describes methods and apparatus provided by alternative embodiments of the present invention:
fig. 3 is a schematic diagram of a DGA domain name closed-loop detection method according to an alternative embodiment of the present invention, where the steps include:
step one: firstly, comparing and analyzing the characteristics of DNS query flow corresponding to a normal domain name and a DGA domain name and the normal domain name and the DGA domain name, setting filtering conditions used in the DGA preprocessing stage in the subsequent step, and starting a detection flow when a DNS message arrives;
Step two: filtering the domain name in the DNS message entering the detection flow by using a domain name white list and threat information, if the domain name hits the white list, the domain name is considered to be a normal domain name, and the domain name jumps out of the detection flow; if the domain name hits threat information, the description is that the domain name is known malicious, a corresponding alarm log is generated to inform a system administrator, and a detection flow is jumped out;
the white list of the domain name in the second step is a set of normal domain names, and comprises a large number of commonly used domain names with higher access heat and some normal domain names which are similar to the DGA domain name in characteristics and are easy to be confused and misreported; threat intelligence refers to known and top-ranked malicious domain names. The domain name white list and threat information can be dynamically updated in real time, and an administrator is supported to manually add the trusted domain name white list and the threat information maintained by the administrator.
Step three: the DNS message enters a preprocessing module, wherein the preprocessing module comprises 2 sub-modules: (a) The domain name preprocessing sub-module is used for preprocessing and checking a single domain name in the DNS message, and if the domain name does not accord with the DGA detection condition, the detection flow is jumped out; (b) The DNS inquiry statistics preprocessing sub-module performs DNS inquiry statistics on a source IP address initiating the DNS inquiry, judges the statistical data, and jumps out of the detection flow if the statistical data do not meet the DGA detection condition;
The domain name preprocessing submodule in the third step, the preprocessing conditions include, but are not limited to, the length of the domain name, whether the domain name contains a special top-level domain name, whether the domain name contains a special domain name beginning and the root ratio of the domain name, and for the domain name which does not meet the conditions, the domain name is of a domain name type which is easy to cause false report or trust, and the DGA domain name detection is not performed.
The DNS query statistics preprocessing sub-module in the third step is to perform historical data statistics of DNS query on the source IP address from which the DNS query is initiated, including the total number of queries and the number of times of querying non-existing domain names in a period of time, analyze and judge the statistical data, and if the statistical data does not meet DGA detection conditions, consider normal DNS query behavior and do not perform DGA domain name detection.
Step four: for the domain name which is preprocessed by the second filtering and the third filtering, firstly searching whether a corresponding detection result is locally cached, and if so, processing according to the cached detection result; if the cache does not exist, recording the domain name to be detected, adding a domain name list to be detected, and marking a DNS message in which the domain name is positioned as delayed transmission;
the local cache in the fourth step is a detection result cache of the domain name which has undergone the DGA detection flow, records whether the domain name is DGA, and if so, records the DGA family to which the domain name belongs.
In the fourth step, the domain name list to be detected is added with domain names needing DGA detection, and the subsequent DGA detection module takes out domain names from the domain name list to be detected for model prediction, so that asynchronous detection is realized.
In the fourth step, the message is delayed to be sent, that is, after the domain name is added into the domain name list to be detected, the DNS message where the domain name is located is buffered, and the delayed to be sent is marked, so that the DGA detection flow is ensured not to block the processing of other subsequent messages.
It should be noted that, the delay_time (unit: milliseconds) of the message delay sending is dynamically determined by the number m of the currently detected domain names and the time t (unit: milliseconds) required for detecting the single domain name, where delay_time=min (m×t, t_max), and t_max is the maximum time of delay.
Step five: the DGA domain name detection module takes out the domain name to be detected from the domain name list to be detected for detection, and sequentially passes through the following detection sub-modules: (a) The domain name feature extraction sub-module is used for extracting relevant features of the domain name to be detected and generating a feature vector F; (b) The DGA classification sub-module takes the feature vector F as input, and uses a DGA classification model to predict whether the domain name is a DGA domain name; (c) The DGA multi-classification sub-module predicts the belonging DGA family of the domain name with the detection result of DGA by taking the feature vector F as input and using a DGA multi-classification model;
The intelligent algorithm model for DGA detection in the fifth step is a trained intelligent algorithm model for DGA domain name detection by using a specific algorithm and parameters through a collected domain name data set by an intelligent algorithm training module, and the model can be regularly trained and updated.
Step six: carrying out local caching on the DGA detection result obtained in the step five, carrying out corresponding processing on the message which is delayed to be sent in the step four, and generating an alarm log for the DGA domain name;
in the step six, the local cache caches the domain names with detection results of DGA and non-DGA, so that the detection efficiency can be effectively improved for some frequently accessed domain names in the network environment; delay sending message to process: for the message corresponding to the non-DGA domain name as the detection result, directly forwarding the message; and generating an alarm log for the message corresponding to the DGA domain name as a detection result, notifying an administrator, and selecting to forward or block the message according to the preset setting of the administrator.
Step seven: performing verification analysis on the domain name detected by the steps, and updating the domain name to a domain name white list if the domain name with the detection result of DGA is found to be false alarm; if the detection result is that the domain name of the non-DGA is the missing report, the non-DGA is added into a training data set of the intelligent algorithm model to participate in training and updating of a subsequent model.
Fig. 4 is a schematic diagram of a DGA domain name closed-loop detection device according to an alternative embodiment of the present invention, where the device is divided into six units as shown in fig. 4:
(1) A domain name prefilter unit;
the domain name prefilter unit of the unit (1) is mainly used for filtering domain names in the DNS message entering the DGA domain name closed loop detection device. Filtering a normal DNS message corresponding to a normal domain name by using a domain name white list, wherein the partial message does not enter a subsequent unit for processing and is directly forwarded; and using threat information to discover known malicious domain names in advance, generating an alarm log for the message of the malicious domain names to inform an administrator, and selecting to continue forwarding or blocking the message according to the preset of the administrator.
(2) A DNS message preprocessing unit;
the DNS message preprocessing unit of the unit (2) preprocesses the historical statistical data of the DNS inquiry in the next period of time according to the partial characteristics of the domain name in the current message and the corresponding source IP of the current DNS inquiry, and does not enter the subsequent unit for processing the message which does not meet the DGA detection condition, so that the message is directly forwarded, and the time is saved; and adding the domain name of the message meeting the DGA detection condition into a to-be-detected domain name list, and entering a subsequent DGA domain name detection unit for processing.
(3) A DGA domain name detection unit;
the unit (3) is a DGA domain name detection unit, which is a core unit for detecting the DGA domain name, uses a DGA intelligent algorithm model to carry out model prediction on the domain name to be detected, and outputs and caches the detection result.
(4) A DNS message delay sending unit;
the unit (4) caches the DNS message in which the domain name to be detected is located, and then decides a processing mode of the message according to a DGA domain name detection result, if the detection result is the DGA domain name, the message is continuously forwarded or blocked according to preset selection of an administrator; if the detection result is the non-DGA domain name, the message is directly forwarded.
(5) An intelligent model training unit;
the intelligent model training unit of the unit (5) can regularly train and update the model according to the latest DGA domain name data set and algorithm parameters; and receiving feedback of detection results of the DGA domain name detection unit, carrying out false alarm and missing report analysis on the fed back domain name, adding the false alarm domain name to a domain name white list, and adding the missing alarm domain name to the DGA domain name data set to participate in model training and updating.
(6) And an administrator interaction unit.
The unit (6) is an administrator interaction unit, which is mainly used for realizing the operation of an administrator on the DGA domain name closed loop detection device, the administrator can manually add or delete the domain name whitelist and threat information in the unit (1) through the unit, set the related preprocessing conditions in the unit (2), and can preset the processing mode of the message where the DGA domain name is located, so that the message is directly forwarded or blocked.
By the alternative embodiments, at least the following advantages can be achieved:
(1) Most normal domain name DNS messages and known malicious domain name DNS messages in the DNS traffic are filtered out through a domain name white list and threat information, so that the number of messages entering a subsequent detection flow can be greatly reduced, and false messages caused by normal domain names with similar characteristics to the DGA domain name can be effectively avoided.
(2) In the DGA domain name preprocessing module, the current domain name is preprocessed according to part of characteristics of the domain name and the DNS query statistics condition, and part of domain names which are easy to generate false alarms are filtered again.
(3) Recording the domain name to be detected which is filtered and preprocessed and carrying out subsequent DGA domain name detection, marking the DNS message of the domain name as delayed transmission, and ensuring the processing timeliness of the DNS message of the DGA domain name by using a message delay transmission mechanism, so that the first DNS query message of the DGA domain name is prevented from being passed, and meanwhile, the processing of subsequent other messages is not blocked and influenced, thereby being beneficial to improving the system performance.
(4) And after the DGA domain name detection is completed, the message is released, alarmed or blocked according to the detection result by using a delay sending mechanism, so that the timeliness of processing the DNS message in which the DGA domain name appears for the first time can be ensured, and other message processing can not be blocked.
(5) Adding a feedback mechanism to the domain name detection result, analyzing the detection result by an intelligent algorithm model training module, adding the false domain name into a domain name white list, and effectively avoiding subsequent false domain names of the same domain name; and the missed domain name can be added into the DGA domain name data set to participate in the training of the next intelligent algorithm model, so that the subsequent missed report opportunity is reduced.
(6) The method has a good detection effect on the novel DGA domain name appearing for the first time, and has the advantages of high detection rate, low false alarm rate and high detection speed.
(7) The method can help a system administrator to quickly discover and timely dispose the DGA domain threat in the network environment, and achieves the purpose of protecting intranet data and host security.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the various embodiments of the present invention.
Example 2
According to an embodiment of the present invention, there is further provided an apparatus for implementing the above domain name category identifying method, and fig. 5 is a block diagram of a domain name category identifying apparatus according to an embodiment of the present invention, as shown in fig. 5, where the apparatus includes: the device is described in detail below as an acquisition module 502, a first matching module 504, a second matching module 506, an extraction module 508, and a determination module 510.
The obtaining module 502 is configured to obtain a domain name system DNS packet, where the DNS packet carries a target domain name.
Optionally, the DNS packet is a packet received by a domain name system, and is typically used to access an IP address corresponding to the target domain name.
The first matching module 504 is connected to the obtaining module 502, and is configured to match the target domain name with a first domain name to obtain a first domain name matching result, where a first domain name category confidence level of the first domain name is greater than or equal to a preset confidence threshold, the first domain name category confidence level is used to represent a confidence level of a first domain name category recognition result of the first domain name, and the first domain name category recognition result is a result of whether the first domain name is a known category domain name;
optionally, the confidence of the first domain name class of the first domain name is greater than or equal to the preset confidence threshold, which is understood to be that the first domain name class identification result of the first domain name is more accurate, so to speak, substantially free of errors. The preset confidence threshold value can be set in a customized manner according to actual application and scene, for example, the preset confidence threshold value is set to be 99%. Therefore, by matching the target domain name with the first domain name, the step can know whether the target domain name has an accurate domain name category recognition result. And under the condition that the first domain name matching result is that the target domain name is matched with the first domain name, the target domain name category recognition result of the target domain name is the first domain name category recognition result. The corresponding judgment can be realized by the step. The judgment is quick, and the first domain name category confidence is larger than or equal to the preset confidence threshold value, so that the judgment is very reliable, and therefore, whether the target domain name is the target domain name category recognition result of the DGA domain name is also reliable.
It should be noted that the number of the first domain names may be plural, and the first domain names may be retrieved from a trusted database, for example, in the case that the first domain names are obtained from a white list database, the first domain name category identification result of the first domain names is a normal domain name; and under the condition that the first domain name is obtained from the blacklist database, the first domain name category identification result of the first domain name is a specific abnormal domain name.
The second matching module 506 is connected to the first matching module 504, and is configured to match the target domain name with the second domain name to obtain a second domain name matching result when the first domain name matching result is that the target domain name is not matched with the first domain name, where a second domain name category confidence of the second domain name is less than a preset confidence threshold, the second domain name category confidence represents a confidence of a second domain name category recognition result of the second domain name, and the second domain name category recognition result is a result of whether the second domain name is a domain generation algorithm DGA domain name.
Optionally, in the case that the first domain name matching result is that the target domain name does not match the first domain name, the description that the target domain name is not a domain name that knows exactly what domain name category is. In this case, the target domain name and the second domain name are to be matched, and a second domain name matching result is obtained.
It should be noted that, the second domain name category identification result of the second domain name is that the second domain name is a DGA domain name, or the second domain name is not a DGA domain name. Thus by matching the target domain name with the second domain name, it can be known whether the target domain name is a DGA domain name or not.
It should be further noted that, although the second domain name category confidence of the second domain name is smaller than the preset confidence threshold, the second domain name category confidence is still accurate, so that through this step, it still can be quickly and accurately determined whether the target domain name is the target domain name category recognition result of the DGA domain name. And the extracting module 508 is connected to the second matching module 506, and is configured to extract a domain name feature corresponding to the target domain name when the second domain name matching result is that the target domain name does not match the second domain name.
Alternatively, in the case where the second domain name matching result is that the target domain name does not match the second domain name, it is described that whether the target domain name is a DGA domain name or not cannot be directly known from the above steps, and at this time, the domain name feature corresponding to the target domain name may be extracted. The application relates to a DGA domain name, which is generated by an algorithm, so that the domain name is regular, and therefore, whether the target domain name is the target domain name category identification result of the DGA domain name can be determined by extracting the domain name characteristics corresponding to the target domain name.
The determining module 510 is connected to the extracting module 508, and is configured to determine whether the target domain name is a target domain name category identification result of the DGA domain name according to the domain name characteristics.
Optionally, the second matching module 506 is further configured to determine a domain name parameter of the target domain name; determining whether the domain name parameters meet preset conditions or not to obtain a first determination result; and under the condition that the first determination result is that the domain name parameter accords with a first preset condition, matching the target domain name with the second domain name to obtain a second domain name matching result.
Optionally, whether the domain name parameter meets the preset condition can be directly determined by acquiring the domain name parameter of the target domain name, and whether the target domain name is to be continuously identified is determined according to the determined result, that is, whether the target domain name is a normal or non-necessary identified domain name can be determined according to the determined result, that is, whether the target domain name is not a DGA domain name can be determined, and under the condition that the target domain name is an abnormal or necessary identified domain name, the target domain name cannot be determined to be not the DGA domain name, and the operation of continuously matching the target domain name with the second domain name to obtain the second domain name matching result is performed.
For example, when the domain name parameter is the main body of the domain name, the determination can be made by the length of the main body, for example, when the main body length of the domain name is less than 5 and greater than 63, it is determined that the domain name is the domain name which is not necessarily recognized, because when the main body length of the domain name is too long, it can be directly determined that the target domain name is not a domain name which can find the IP address, that is, an unused domain name, and when the main body length of the domain name is too short, it is usually a normal domain name which has already been registered. The identification of the target domain name may not continue.
It should be noted that, the domain name parameters of the target domain name may also include a suffix, a beginning, a root word, and the like. In the case where the domain name parameter of the target domain name includes a suffix of the domain name, the suffix is a reverse query suffix or a specific suffix, it is determined that it is a normal domain name, and thus the target domain name may not be continuously identified. In the case that the domain name parameter of the target domain name includes a root, the root ratio is less than or equal to 0.4, and it is determined that the target domain name is not necessarily recognized, and subsequent recognition is not performed, and the like, the customized setting can be performed according to the actual application and scene.
By the method, some domain name formats which do not need to be continuously identified can be rapidly screened out, so that the identification progress of the target domain name category identification result of the target domain name can be accelerated.
Optionally, the second matching module 506 is further configured to determine an IP address of the DNS packet; acquiring a historical DNS message sent by an IP address within a preset time range; determining a sending message index corresponding to the IP address according to the historical DNS message; determining whether the index of the sent message accords with a preset condition or not to obtain a second determination result; and under the condition that the second determination result is that the index of the sent message accords with a second preset condition, matching the target domain name with the second domain name to obtain a second domain name matching result.
Optionally, the IP address of the DNS packet is determined, that is, the IP address of the DNS packet is sent, and whether the DNS query behavior of the IP address is normal or not may be determined by acquiring the historical DNS packet sent by the IP address within the predetermined time range, if the IP address is an abnormal IP address, the identification is continued, and if the IP address is a normal IP address, the identification is not performed. Determining a sending message index corresponding to the IP address according to the historical DNS message; and determining whether the index of the transmitted message accords with a preset condition to identify whether the IP address is normal or abnormal.
It should be noted that, when determining the sending message index corresponding to the IP address according to the historical DNS message, the message index may be determined by multiple aspects of the historical DNS message, for example, the message index may refer to the number of queries within a predetermined time range, or may refer to the number of times of querying a domain name that does not exist. If the number of queries is less than 4, the identification is not performed, and the number of times of domain name queries divided by the total number of times of domain name queries is less than or equal to 20%.
By the method, normal IP addresses which do not need to be identified can be screened out quickly, so that the identification progress of the target domain name category identification result of the target domain name can be accelerated.
Optionally, the determining module 510 is further configured to determine a target domain name category confidence of the target domain name category identification result.
Optionally, determining the target domain name category confidence coefficient of the target domain name category recognition result, and taking the target domain name as the second domain name to participate in the matching link of the method provided by the application under the condition that the target domain name category confidence coefficient is smaller than a preset confidence coefficient threshold value and still larger than a certain confidence coefficient threshold value. That is, when a target domain name is further one, when the target domain name is matched with the second domain name, the new target domain name is compared with the last target domain name that has been the second domain name, and if so, the new target domain name is identical to the target domain name category recognition result of the last target domain name that has been the second domain name. At this time, it should be noted that the number of the second domain names is at least two, and the target domain name is compared with the second domain names so as not to be omitted.
Optionally, the extracting module 508 is further configured to input the target domain name to a feature extracting module of a domain name category identifying model to obtain a domain name feature, where the domain name detecting model is obtained by training sample data, the sample data includes a third domain name and an updated domain name category identifying result corresponding to the third domain name, and the third domain name is a domain name in which the corresponding second domain name category identifying result is inconsistent with the corresponding updated domain name category identifying result in the second domain name;
Optionally, the determining module 510 is further configured to input the domain name feature to a domain name category recognition module of the domain name category recognition model, to obtain a target domain name category recognition result.
Optionally, the purpose of extracting the domain name characteristics corresponding to the target domain name and determining whether the target domain name is the target domain name category identification result of the DGA domain name according to the domain name characteristics is achieved in a model mode. By means of the model, feature extraction can be more accurate, and the process of determining whether the target domain name is the target domain name category recognition result of the DGA domain name can be greatly accelerated, so that the method is beneficial to implementation.
Moreover, it should be noted that the domain name recognition model is obtained by training sample data, the sample data includes a third domain name and an updated domain name type recognition result corresponding to the third domain name, and the third domain name is a domain name in which the corresponding second domain name type recognition result is inconsistent with the corresponding updated domain name type recognition result in the second domain name. The updated domain name category recognition result is obtained after the domain name category recognition result is obtained, and the obtained updated domain name category recognition result is more accurately checked, so that in the second domain name, if the corresponding second domain name category recognition result is inconsistent with the corresponding updated domain name category recognition result, no distinguishing features are learned in the model, and the model is used as sample data to participate in the process of training the model, so that the model can be more accurate.
It should be noted that after the domain name category recognition result is obtained, after the obtained domain name category recognition result is more accurately updated, the second domain name category recognition result corresponding to the second domain name can be updated under the condition that the corresponding second domain name category recognition result is inconsistent with the corresponding updated domain name category recognition result, and the incorrect result is changed to be correct, so that the matching result is more accurate during subsequent matching.
Optionally, the second matching module 506 is further configured to determine that the DNS packet is a delayed send packet and determine a delayed send time of the delayed send packet when the second domain name matching result is that the target domain name does not match the second domain name.
Optionally, a processing manner of the DNS packet is described when the second domain name matching result is that the target domain name does not match the second domain name. In this case, it is necessary to further determine whether the target domain name is a DGA domain name, and therefore, at this time, it is necessary to determine that the DNS packet is a delayed transmission packet, determine the delayed transmission time of the delayed transmission packet, and perform delayed transmission processing on the DNS packet, thereby ensuring that there is enough time to determine the domain name type of the target domain name, and avoiding the phenomena such as middle viruses caused by direct transmission of the DGA domain name or a malicious domain name.
Optionally, the determining module 510 is further configured to determine, according to the domain name feature, whether the target domain name is a target domain name category identification result of the DGA domain name, and then further include: and sending alarm information to a preset terminal under the condition that the target domain name is DGA domain name as a target domain name identification result, wherein the alarm information carries the target domain name.
Optionally, it is described that, when the target domain name is DGA, the alarm information carrying the target domain name is timely sent to the predetermined terminal, so that an operator using the predetermined terminal can timely process the abnormal domain name or verify the abnormal domain name, and the actual process can be set in a customized manner according to specific applications and scenes.
Here, the above-mentioned obtaining module 502, the first matching module 504, the second matching module 506, the extracting module 508 and the determining module 510 correspond to steps S102 to S110 in implementing the domain name category identifying method, and the plurality of modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in the above-mentioned embodiment 1.
Example 3
According to another aspect of the embodiment of the present invention, there is also provided an electronic device including: a processor; a memory for storing processor-executable instructions, wherein the processor is configured to execute the instructions to implement the domain name category identification method of any of the above.
Example 4
According to another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the domain name category identification method of any one of the above.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (10)

1. A domain name class identification method, comprising:
acquiring a Domain Name System (DNS) message, wherein the DNS message carries a target domain name;
matching the target domain name with a first domain name to obtain a first domain name matching result, wherein the first domain name category confidence of the first domain name is greater than or equal to a preset confidence threshold, the first domain name category confidence is used for indicating the confidence of a first domain name category identification result of the first domain name, and the first domain name category identification result is a result of whether the first domain name is a known category domain name;
under the condition that the first domain name matching result is that the target domain name is not matched with the first domain name, matching the target domain name with a second domain name to obtain a second domain name matching result, wherein the second domain name category confidence of the second domain name is smaller than the preset confidence threshold value, the second domain name category confidence represents the confidence of a second domain name category identification result of the second domain name, and the second domain name category identification result is the result of whether the second domain name is a domain generation algorithm DGA domain name;
Extracting domain name characteristics corresponding to the target domain name under the condition that the second domain name matching result is that the target domain name is not matched with the second domain name;
and determining whether the target domain name is a target domain name category identification result of the DGA domain name according to the domain name characteristics.
2. The method of claim 1, wherein the matching the target domain name with a second domain name to obtain a second domain name matching result comprises:
determining domain name parameters of the target domain name;
determining whether the domain name parameters meet preset conditions or not to obtain a first determination result;
and under the condition that the first determination result is that the domain name parameter accords with a first preset condition, matching the target domain name with the second domain name to obtain a second domain name matching result.
3. The method of claim 1, wherein the matching the target domain name with a second domain name to obtain a second domain name matching result comprises:
determining an Internet Protocol (IP) address for sending the DNS message;
acquiring a historical DNS message sent by the IP address within a preset time range;
determining a sending message index corresponding to the IP address according to the historical DNS message;
Determining whether the index of the sent message accords with a preset condition or not to obtain a second determination result;
and under the condition that the second determination result is that the sent message index accords with a second preset condition, matching the target domain name with the second domain name to obtain a second domain name matching result.
4. The method according to claim 1, wherein after determining whether the target domain name is the target domain name category identification result of the DGA domain name according to the domain name characteristics, further comprises:
and determining the target domain name category confidence of the target domain name category recognition result.
5. The method according to claim 1, wherein the extracting the domain name feature corresponding to the target domain name; determining whether the target domain name is a target domain name category identification result of the DGA domain name according to the domain name characteristics comprises:
inputting the target domain name into a feature extraction module of a domain name category recognition model to obtain the domain name feature, wherein the domain name detection model is obtained by training sample data, the sample data comprises a third domain name and an updated domain name category recognition result corresponding to the third domain name, and the third domain name is a domain name in which the corresponding second domain name category recognition result is inconsistent with the corresponding updated domain name category recognition result in the second domain name;
And inputting the domain name characteristics to a domain name category recognition module of the domain name category recognition model to obtain the target domain name category recognition result.
6. The method according to claim 1, wherein after the matching the target domain name with the second domain name to obtain the second domain name matching result, further comprising:
and under the condition that the second domain name matching result is that the target domain name is not matched with the second domain name, determining that the DNS message is a delayed sending message, and determining the delay sending time of the delayed sending message.
7. The method according to any one of claims 1 to 6, wherein after determining whether the target domain name is a target domain name category identification result of the DGA domain name according to the domain name characteristics, further comprising:
and sending alarm information to a preset terminal under the condition that the target domain name is the DGA domain name as the target domain name identification result, wherein the alarm information carries the target domain name.
8. A domain name class identification device, comprising:
the system comprises an acquisition module, a judgment module and a judgment module, wherein the acquisition module is used for acquiring a Domain Name System (DNS) message, wherein the DNS message carries a target domain name;
The first matching module is used for matching the target domain name with a first domain name to obtain a first domain name matching result, wherein the first domain name category confidence coefficient of the first domain name is larger than or equal to a preset confidence coefficient threshold value, the first domain name category confidence coefficient is used for representing the confidence coefficient of a first domain name category recognition result of the first domain name, and the first domain name category recognition result is a result of whether the first domain name is a known category domain name;
the second matching module is used for matching the target domain name with a second domain name to obtain a second domain name matching result when the first domain name matching result is that the target domain name is not matched with the first domain name, wherein the second domain name category confidence of the second domain name is smaller than the preset confidence threshold value, the second domain name category confidence represents the confidence of a second domain name category recognition result of the second domain name, and the second domain name category recognition result is the result of a domain generation algorithm DGA domain name or not;
the extraction module is used for extracting domain name characteristics corresponding to the target domain name under the condition that the second domain name matching result is that the target domain name is not matched with the second domain name;
And the determining module is used for determining whether the target domain name is a target domain name category identification result of the DGA domain name according to the domain name characteristics.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the domain name category identification method of any of claims 1 to 7.
10. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the domain name category identification method of any one of claims 1 to 7.
CN202310714247.9A 2023-06-15 2023-06-15 Domain name category identification method and device and electronic equipment Pending CN116760596A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310714247.9A CN116760596A (en) 2023-06-15 2023-06-15 Domain name category identification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310714247.9A CN116760596A (en) 2023-06-15 2023-06-15 Domain name category identification method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116760596A true CN116760596A (en) 2023-09-15

Family

ID=87949097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310714247.9A Pending CN116760596A (en) 2023-06-15 2023-06-15 Domain name category identification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116760596A (en)

Similar Documents

Publication Publication Date Title
CN109951500B (en) Network attack detection method and device
US9507944B2 (en) Method for simulation aided security event management
CN108683687B (en) Network attack identification method and system
CN108881263B (en) Network attack result detection method and system
CN112866023B (en) Network detection method, model training method, device, equipment and storage medium
CN107360118B (en) Advanced persistent threat attack protection method and device
CN111818103B (en) Traffic-based tracing attack path method in network target range
JP6408395B2 (en) Blacklist management method
CN114021040B (en) Method and system for alarming and protecting malicious event based on service access
CN113079150B (en) Intrusion detection method for power terminal equipment
CN112769833B (en) Method and device for detecting command injection attack, computer equipment and storage medium
CN110768949B (en) Vulnerability detection method and device, storage medium and electronic device
CN113923003A (en) Attacker portrait generation method, system, equipment and medium
CN111404768A (en) DPI recognition realization method and equipment
CN114257403B (en) False alarm detection method, equipment and readable storage medium
CN112272175A (en) Trojan horse virus detection method based on DNS
CN112583827B (en) Data leakage detection method and device
CN114285639A (en) Website security protection method and device
CN109190408B (en) Data information security processing method and system
KR20070077517A (en) Profile-based web application intrusion detection system and the method
CN115913634A (en) Network security abnormity detection method and system based on deep learning
CN113992371B (en) Threat label generation method and device for traffic log and electronic equipment
CN112565259B (en) Method and device for filtering DNS tunnel Trojan communication data
CN116760596A (en) Domain name category identification method and device and electronic equipment
Wang et al. Cyber security threat intelligence monitoring and classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination