CN112422589A - Domain name system request identification method, storage medium and electronic device - Google Patents

Domain name system request identification method, storage medium and electronic device Download PDF

Info

Publication number
CN112422589A
CN112422589A CN202110093152.0A CN202110093152A CN112422589A CN 112422589 A CN112422589 A CN 112422589A CN 202110093152 A CN202110093152 A CN 202110093152A CN 112422589 A CN112422589 A CN 112422589A
Authority
CN
China
Prior art keywords
dns request
target
feature
request packet
data packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110093152.0A
Other languages
Chinese (zh)
Other versions
CN112422589B (en
Inventor
彭婧
甘祥
郑兴
郭晶
范宇河
唐文韬
申军利
刘羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110093152.0A priority Critical patent/CN112422589B/en
Publication of CN112422589A publication Critical patent/CN112422589A/en
Application granted granted Critical
Publication of CN112422589B publication Critical patent/CN112422589B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a domain name system request identification method, a storage medium and electronic equipment. Wherein, the method comprises the following steps: the method comprises the steps of obtaining a group of domain name system DNS request data packet sequences, obtaining a feature set of each DNS request data packet sequence in the group of DNS request data packet sequences to obtain a target feature set, determining target feature vectors of the group of DNS request data packet sequences according to the target feature sets, inputting the target feature vectors into a target classifier, and obtaining a target classification result output by the target classifier. The invention solves the technical problem that the encrypted DNS request data packet is difficult to be effectively identified in the related technology.

Description

Domain name system request identification method, storage medium and electronic device
Technical Field
The invention relates to the field of computers, in particular to a domain name system request identification method, a storage medium and electronic equipment.
Background
In the related art, more and more network data are encrypted in order to ensure user privacy, for example, previous DNS data request packets are all public transparent network data, while DNS encryption traffic protocols DoT and DoH proposed in recent years are steadily advancing, however, in many cases, DNS information needs to be analyzed to implement security detection of data transmission, for example, for analysis of some encrypted data, DNS information needs to be analyzed, for example, for access to some malicious websites, blocking needs to be performed, and the like.
However, the existing technology for identifying DNS data in the related art is not suitable for DNS encrypted data, because DNS data is opposite to network data and has burstiness, and the DNS encrypted data mainly consists of small data packets, and the DNS encrypted data cannot be effectively analyzed to determine whether the DNS encrypted data is malicious data, and the like.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a domain name system request identification method, a storage medium and electronic equipment, which are used for at least solving the technical problem that an encrypted DNS request data packet is difficult to effectively identify in the related technology.
According to an aspect of the embodiments of the present invention, there is provided a method for identifying a domain name system request, including: acquiring a group of domain name system DNS request data packet sequences, wherein each DNS request data packet sequence comprises a group of encrypted DNS request data packets; acquiring a feature set of each DNS request data packet sequence in the group of DNS request data packet sequences to obtain a target feature set; determining a target feature vector of the group of DNS request data packet sequences according to the target feature set, wherein the target feature vector is used for representing the occurrence times of different feature members in the target feature set; and inputting the target feature vector into a target classifier to obtain a target classification result output by the target classifier, wherein the target classification result is used for indicating whether the group of DNS request data packet sequences are malicious DNS request data packet sequences.
Optionally, the method further comprises: repeatedly executing the following steps until the target classifier is determined: obtaining a group of sample DNS request data packet sequences, wherein each sample DNS request data packet sequence comprises a group of encrypted DNS request data packets; acquiring a feature set of each sample DNS request data packet sequence in the group of sample DNS request data packet sequences to obtain a sample feature set; determining a sample feature vector of the group of sample DNS request data packet sequences according to the sample feature set, wherein the sample feature vector is used for representing the times of occurrence of different feature members in the sample feature set; inputting the sample feature vectors into a sample classifier to obtain a sample classification result output by the sample classifier, wherein the sample classification result is used for indicating whether the group of sample DNS request data packet sequences are malicious DNS request data packet sequences or not, and indicating a malicious type to which the group of sample DNS request data packet sequences belong when the group of sample DNS request data packet sequences are the malicious DNS request data packet sequences; determining a function value of a preset target loss function according to the sample classification result and an actual classification result of the group of sample DNS request data packet sequences, wherein the actual classification result is used for indicating whether the group of sample DNS request data packet sequences are actually malicious DNS request data packet sequences or not and indicating a malicious type to which the group of sample DNS request data packet sequences belong when the group of sample DNS request data packet sequences are actually malicious DNS request data packet sequences; under the condition that the function value of the target loss function does not meet a preset loss condition, adjusting parameters in the sample classifier; and under the condition that the function value of the target loss function meets the loss condition, finishing the training of the sample classifier, and determining the target classifier as the sample classifier when the training is finished.
According to another aspect of the embodiments of the present invention, there is also provided an apparatus for identifying a domain name system request, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a group of domain name system DNS request data packet sequences, and each DNS request data packet sequence comprises a group of encrypted DNS request data packets;
the second acquisition module is used for acquiring the feature set of each DNS request data packet sequence in the group of DNS request data packet sequences to obtain a target feature set;
a determining module, configured to determine a target feature vector of the group of DNS request packet sequences according to the target feature set, where the target feature vector is used to indicate the number of times that different feature members in the target feature set appear;
and the input module is used for inputting the target feature vector into a target classifier to obtain a target classification result output by the target classifier, wherein the target classification result is used for indicating whether the group of DNS request data packet sequences are malicious DNS request data packet sequences or not.
Optionally, the second obtaining module includes:
and the extracting unit is used for extracting the features of each DNS request data packet sequence according to sliding windows with different lengths to obtain a plurality of feature sets of each DNS request data packet sequence, wherein the target feature set comprises the plurality of feature sets of each DNS request data packet sequence.
Optionally, the extracting unit is configured to extract the features of each DNS request packet sequence according to sliding windows with different lengths in the following manner, so as to obtain a plurality of feature sets of each DNS request packet sequence:
in a case that the group of DNS request packet sequences includes K DNS request packet sequences, for each DNS request packet sequence, performing the following operation, where K is a natural number greater than 1:
obtaining the characteristics (P1, …, Pi, …, Pnj) of the jth DNS request packet sequence, wherein Pi represents the characteristics of the ith DNS request packet in the jth DNS request packet sequence, nj represents the number of DNS request packets in the jth DNS request packet sequence,
Figure 270590DEST_PATH_IMAGE001
Figure 732796DEST_PATH_IMAGE002
and performing sliding extraction on the features (P1, …, Pi, … and Pnj) according to M sliding windows with different lengths to obtain M feature sets of the jth DNS request packet sequence, wherein M is a natural number greater than 1.
Optionally, the extracting unit is configured to perform sliding extraction on the features (P1, …, Pi, …, Pnj) according to M sliding windows with different lengths in the following manner, so as to obtain M feature sets of the jth DNS request packet sequence:
performing sliding extraction on the features (P1, …, Pi, …, Pnj) according to a sliding window with the length of 1 to obtain a first feature set (P1),.., (Pi),.., (Pnj);
performing sliding extraction on the features (P1, …, Pi, …, Pnj) according to a sliding window with the length of 2 to obtain a second feature set (P1, P2), …, (Pnj-1, Pnj);
wherein the M feature sets include the first feature set and the second feature set.
Optionally, the extracting unit is configured to obtain the characteristics (P1, …, Pi, …, Pnj) of the jth DNS request packet sequence by:
obtaining packet characteristics (p 1, …, pi, …, pnj) of each DNS request packet in the jth DNS request packet sequence;
determining characteristics (P1, …, Pi, …, Pnj) of a jth DNS request packet sequence according to the packet characteristics of each DNS request packet in the jth DNS request packet sequence and the transmission direction of each DNS request packet, wherein Pi = si × Pi, where Pi represents the packet characteristics of an ith DNS request packet in the jth DNS request packet sequence;
wherein si is 1 when the transmission direction of the ith DNS request data packet is from a client to a DNS resolver, and si is-1 when the transmission direction of the ith DNS request data packet is from the DNS resolver to the client; or if the transmission direction of the ith DNS request packet is from the client to the DNS resolver, si is-1, and if the transmission direction of the ith DNS request packet is from the DNS resolver to the client, si is 1.
Optionally, the extracting unit is configured to obtain a packet characteristic (p 1, …, pi, …, pnj) of each DNS request packet in the jth DNS request packet sequence by:
and obtaining the packet size of each DNS request packet in the jth DNS request packet sequence to obtain characteristics (p 1, …, pi, … and pnj), wherein pi represents the packet size of the ith DNS request packet in the jth DNS request packet sequence.
Optionally, the determining module includes:
a first obtaining unit, configured to obtain the number Q of different feature members in the target feature set and the number of times that the Q different feature members appear in the target feature set, where Q is a natural number greater than 1;
a first determining unit, configured to determine the target feature vector as a Q-dimensional feature vector, where each vector member in the Q-dimensional feature vector is used to represent a number of times that a corresponding one of the Q different feature members appears in the target feature set.
Optionally, the first obtaining unit is configured to obtain the number Q of different feature members in the target feature set and the number of times that the Q different feature members appear in the target feature set by:
and under the condition that the group of DNS request data packet sequences comprises K DNS request data packet sequences, the target feature set comprises K × M feature sets of the K DNS request data packet sequences, and the M feature sets of each DNS request data packet sequence are feature sets obtained by extracting features of each DNS request data packet sequence according to M sliding windows with different lengths, acquiring the number Q of different feature members in the K × M feature sets and the occurrence times of the Q different feature members in the K × M feature sets, wherein K is a natural number greater than 1, and M is a natural number greater than 1.
Optionally, the input module includes:
a second determining unit, configured to determine, by each target decision tree of the S target decision trees, a decision result according to the target feature vector under the condition that the target classifier includes S target decision trees, to obtain S decision results, where each decision result of the S decision results is used to indicate whether the group of DNS request packet sequences is a malicious DNS request packet sequence, and S is a natural number greater than 1;
and the third determining unit is used for determining the target classification result according to the S decision results.
Optionally, the third determining unit is configured to determine the target classification result according to the S decision results by:
determining the target classification result as the group of DNS request packet sequences being malicious DNS request packet sequences when the number of first decision results in the S decision results is greater than the number of second decision results, where the first decision result is used to indicate that the group of DNS request packet sequences is a malicious DNS request packet sequence, and the second decision result is used to indicate that the group of DNS request packet sequences is not a malicious DNS request packet sequence;
determining the target classification result as the group of DNS request packet sequence not being a malicious DNS request packet sequence if the number of the first decision results in the S decision results is less than the number of the second decision results.
Optionally, the third determining unit is configured to determine the target classification result according to the S decision results by:
determining the target classification result as a malicious DNS request packet sequence of which the group of DNS request packet sequences is of a target malicious type when the number of third decision results in the S decision results is the largest, wherein the third decision result is used to indicate that the group of DNS request packet sequences is of the target malicious type;
each decision result in the S decision results is used to indicate whether the group of DNS request packet sequences is a malicious DNS request packet sequence, and when the group of DNS request packet sequences is a malicious DNS request packet sequence, further indicates a malicious type to which the group of DNS request packet sequences belongs.
Optionally, the apparatus is further configured to:
determining a target confidence of the target classification result according to the target feature vector and a training feature vector when the target classification result indicates that the group of DNS request packet sequences is a malicious DNS request packet sequence of a target malicious type, wherein the training feature vector is used for indicating the occurrence times of different feature members in a sample feature set, the sample feature set is a set determined according to a feature set of each sample DNS request packet sequence in the group of sample DNS request packet sequences, and the group of sample DNS request packet sequences are actually the malicious DNS request packet sequence of the target malicious type;
determining the group of DNS request data packet sequences as malicious DNS request data packet sequences of the target malicious type under the condition that the target confidence is larger than a preset confidence threshold;
determining the set of DNS request packet sequences as DNS request packet sequences of unknown class if the target confidence is less than the confidence threshold.
Optionally, the apparatus is configured to determine a target confidence of the target classification result according to the target feature vector and the training feature vector by:
determining a target confidence for the target classification result by:
Figure DEST_PATH_IMAGE003
wherein the content of the first and second substances,
Figure 690388DEST_PATH_IMAGE004
representing the target confidence of the target classification result, Q representing the number of different feature members in the target feature set, and the target feature vector being
Figure 145509DEST_PATH_IMAGE005
Figure 137735DEST_PATH_IMAGE006
Representing the number of times of the ith different feature member in the target feature set appearing in the target feature set, wherein the training feature vector is
Figure 302000DEST_PATH_IMAGE007
Figure 12467DEST_PATH_IMAGE008
Representing an average number of occurrences of an ith different feature member of the sample feature set in the sample feature set.
Optionally, the apparatus is further configured to:
repeatedly executing the following steps until the target classifier is determined:
obtaining a group of sample DNS request data packet sequences, wherein each sample DNS request data packet sequence comprises a group of encrypted DNS request data packets;
acquiring a feature set of each sample DNS request data packet sequence in the group of sample DNS request data packet sequences to obtain a sample feature set;
determining a sample feature vector of the group of sample DNS request data packet sequences according to the sample feature set, wherein the sample feature vector is used for representing the times of occurrence of different feature members in the sample feature set;
inputting the sample feature vectors into a sample classifier to obtain a sample classification result output by the sample classifier, wherein the sample classification result is used for indicating whether the group of sample DNS request data packet sequences are malicious DNS request data packet sequences or not, and indicating a malicious type to which the group of sample DNS request data packet sequences belong when the group of sample DNS request data packet sequences are the malicious DNS request data packet sequences;
determining a function value of a preset target loss function according to the sample classification result and an actual classification result of the group of sample DNS request data packet sequences, wherein the actual classification result is used for indicating whether the group of sample DNS request data packet sequences are actually malicious DNS request data packet sequences or not and indicating a malicious type to which the group of sample DNS request data packet sequences belong when the group of sample DNS request data packet sequences are actually malicious DNS request data packet sequences;
under the condition that the function value of the target loss function does not meet a preset loss condition, adjusting parameters in the sample classifier;
and under the condition that the function value of the target loss function meets the loss condition, finishing the training of the sample classifier, and determining the target classifier as the sample classifier when the training is finished.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, where the computer program is configured to execute the above method for identifying a domain name system request when the computer program is executed.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores therein a computer program, and the processor is configured to execute the above method for identifying a domain name system request by using the computer program.
In the embodiment of the invention, a group of domain name system DNS request data packet sequences are obtained, a characteristic set of each DNS request data packet sequence in the group of DNS request data packet sequences is obtained, a target characteristic set is obtained, a target characteristic vector of the group of DNS request data packet sequences is determined according to the target characteristic set, the target characteristic vector is input into a target classifier, and a target classification result output by the target classifier is obtained, the target characteristic set for representing the group of DNS request data packets is extracted through obtaining the group of DNS request data packets, the target characteristic vector is further determined, whether the group of DNS request data packets are a malicious DNS request data packet sequence is identified through the classifier, the purpose of effectively identifying the DNS request data packets on the basis of protecting the privacy of the DNS request data packets sent by a user and effectively protecting the safety of the DNS request data packets sent by the user is achieved, therefore, the technical effect of improving the identification efficiency of the DNS request data packet is achieved, and the technical problem that the encrypted DNS request data packet is difficult to effectively identify in the related technology is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of an application environment of an alternative domain name system request identification method according to an embodiment of the invention;
fig. 2 is a flowchart illustrating an alternative method for identifying a domain name system request according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an alternative domain name system request identification method according to an embodiment of the invention;
FIG. 4 is a schematic diagram of an alternative domain name system request identification method according to an embodiment of the invention;
FIG. 5 is a schematic diagram of an alternative domain name system request identification method according to an embodiment of the invention;
FIG. 6 is a schematic diagram of an alternative domain name system request identification method according to an embodiment of the invention;
FIG. 7 is a schematic diagram of an alternative domain name system request identification method according to an embodiment of the invention;
FIG. 8 is a schematic diagram of yet another alternative domain name system request identification method according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an alternative domain name system request identification apparatus according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial nouns or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
DNS: the Domain Name System (English: Domain Name System, abbreviation: DNS) is a service of the Internet. It acts as a distributed database that maps domain names and IP addresses to each other, enabling people to more conveniently access the internet.
Flow rate: a data packet or a sequence of data packets.
DoT: DNS over TLS (abbreviated: DoT) is a security protocol that encrypts and packages a Domain Name System (DNS) through a transport layer security protocol (TLS). This protocol is intended to prevent man-in-the-middle attacks and control DNS data to protect user privacy.
DoH: DNS over HTTPS (abbreviation: DoH) is a domain name resolution scheme for security. The method has the significance that the encrypted HTTPS protocol is used for carrying out the DNS analysis request, so that the problem that the DNS analysis request of a user in the original DNS protocol is intercepted or modified (such as man-in-the-middle attack) is avoided, and the purpose of protecting the privacy of the user is achieved.
N-Gram is an algorithm based on a statistical language model. The basic idea is to perform a sliding window operation with the size of N on the content in the text according to bytes, and form a byte fragment sequence with the length of N. Using the Latin-word prefix, an n-gram of size 1 is referred to as a "uni-gram"; size 2 is then "bi-gram"; size 3 is a "tri-gram".
And (3) RF forest algorithm: in machine learning, a random forest is a classifier that contains multiple decision trees, and the class of its output is determined by the mode of the class output by the individual trees.
Decision tree: decision trees represent the splitting of data using a tree data structure: the nodes represent conditions on one of the data features and the branches represent decisions based on evaluation of the conditions.
The invention is illustrated below with reference to examples:
according to an aspect of the embodiment of the present invention, there is provided a method for identifying a domain name system request, and optionally, in this embodiment, the method for identifying a domain name system request may be applied to a hardware environment formed by a server 101 and a user terminal 103 as shown in fig. 1. As shown in fig. 1, a server 101 is connected to a terminal 103 through a network, and may be configured to provide a service to a user terminal or a client installed on the user terminal, where the client may be a video client, an instant messaging client, a browser client, an education client, a game client, or the like. The database 105 may be provided on or separate from the server for providing data storage services for the server 101, such as a game data storage server, and the network may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: a local area network, a metropolitan area network, and a wide area network, the wireless network comprising: bluetooth, WIFI, and other wireless communication enabled networks, the user terminal 103 may include, but is not limited to, at least one of: the server may be a single server, or a server cluster composed of a plurality of servers, or a cloud server, and may include but is not limited to a router or a gateway, and the DNS request packet may be sent by the application 107 installed in the user terminal 103 to request a corresponding IP address, so as to meet a requirement for accessing an Internet service, and the resolver deployed on the server 101 is used to execute the identification method of the domain name system request, so as to identify the DNS request packet, and further determine whether a group of DNS request packet sequences is a malicious DNS request packet sequence.
As shown in fig. 1, the above method for identifying a domain name system request can be implemented in the user terminal 103 by the following steps:
s1, the server 101 obtains a group of DNS request packet sequences sent by the user terminal 103, where each DNS request packet sequence includes a group of encrypted DNS request packets;
s2, the server 101 obtains a feature set of each DNS request packet sequence in a group of DNS request packet sequences to obtain a target feature set;
s3, the server 101 determines a target feature vector of a group of DNS request data packet sequences according to the target feature set, wherein the target feature vector is used for representing the occurrence times of different feature members in the target feature set;
s4, the server 101 inputs the target feature vector into the target classifier, and obtains a target classification result output by the target classifier, where the target classification result is used to indicate whether a group of DNS request packet sequences is a malicious DNS request packet sequence.
Optionally, in this embodiment, the above method for identifying a domain name system request may also be used by a user terminal including but not limited to a user terminal configured with an actual required computing capability.
Optionally, in this embodiment, the identification method of the domain name system request may include, but is not limited to, asynchronous use of the server 101 and the user terminal 103.
The above is merely an example, and the present embodiment is not particularly limited.
Optionally, as an optional implementation manner, as shown in fig. 2, the method for identifying a domain name system request includes:
s202, obtaining a group of domain name system DNS request data packet sequences, wherein each DNS request data packet sequence comprises a group of encrypted DNS request data packets;
s204, acquiring a feature set of each DNS request data packet sequence in the group of DNS request data packet sequences to obtain a target feature set;
s206, determining a target feature vector of the group of DNS request data packet sequences according to the target feature set, wherein the target feature vector is used for representing the occurrence times of different feature members in the target feature set;
s208, inputting the target feature vector into a target classifier to obtain a target classification result output by the target classifier, wherein the target classification result is used for indicating whether the group of DNS request data packet sequences are malicious DNS request data packet sequences.
Optionally, in this embodiment, the Domain Name System (DNS) serves as a distributed database for mapping Domain names and IP addresses to each other, so that a user can access the internet more conveniently and quickly. The DNS request packet is a packet based on DNS service and used to acquire an IP address corresponding to a domain name from a corresponding DNS server according to the domain name, thereby meeting the actual demand for accessing the internet. The DNS request packet may include, but is not limited to, a source IP, a destination IP, and other message data, and may also include, but is not limited to, a source IP, a source port, a destination IP, a destination port, and other message data.
Optionally, in this embodiment, the method for identifying a domain name system request may be applied to application scenarios including, but not limited to, medical treatment, finance, credit investigation, banking, government affairs, government, game, energy, education, security, building, game, traffic, internet of things, artificial intelligence, intelligent hardware, and industry, and may also include, but not limited to, application scenarios applied to cloud technology.
The Cloud technology (Cloud technology) is a hosting technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data
Cloud technology (Cloud technology) is a general term for network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, and can form a resource pool which can be used as required and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.
Cloud computing (cloud computing) refers to a delivery and use mode of an IT infrastructure, and refers to obtaining required resources in an on-demand and easily-extensible manner through a network; the generalized cloud computing refers to a delivery and use mode of a service, and refers to obtaining a required service in an on-demand and easily-extensible manner through a network. Such services may be IT and software, internet related, or other services. Cloud Computing is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), distributed Computing (distributed Computing), Parallel Computing (Parallel Computing), Utility Computing (Utility Computing), Network Storage (Network Storage Technologies), Virtualization (Virtualization), Load balancing (Load Balance), and the like.
With the development of diversification of internet, real-time data stream and connecting equipment and the promotion of demands of search service, social network, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Different from the prior parallel distributed computing, the generation of cloud computing can promote the revolutionary change of the whole internet mode and the enterprise management mode in concept.
Cloud Security (Cloud Security) refers to a generic term for Security software, hardware, users, organizations, secure Cloud platforms based on Cloud computing business model applications. The cloud security integrates emerging technologies and concepts such as parallel processing, grid computing and unknown virus behavior judgment, abnormal monitoring of software behaviors in the network is achieved through a large number of meshed clients, the latest information of trojans and malicious programs in the internet is obtained and sent to the server for automatic analysis and processing, and then the virus and trojan solution is distributed to each client.
The main research directions of cloud security include: 1. the cloud computing security mainly researches how to guarantee the security of the cloud and various applications on the cloud, including the security of a cloud computer system, the secure storage and isolation of user data, user access authentication, information transmission security, network attack protection, compliance audit and the like; 2. the cloud of the security infrastructure mainly researches how to adopt cloud computing to newly build and integrate security infrastructure resources and optimize a security protection mechanism, and comprises the steps of constructing a super-large-scale security event and an information acquisition and processing platform through a cloud computing technology, realizing the acquisition and correlation analysis of mass information, and improving the handling control capability and the risk control capability of the security event of the whole network; 3. the cloud security service mainly researches various security services, such as anti-virus services and the like, provided for users based on a cloud computing platform.
The so-called artificial intelligence cloud Service is also generally called AIaaS (AI as a Service, chinese). The method is a service mode of an artificial intelligence platform, and particularly, the AIaaS platform splits several types of common AI services and provides independent or packaged services at a cloud. This service model is similar to the one opened in an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through an API (application programming interface), and part of the qualified developers can also use an AI framework and an AI infrastructure provided by the platform to deploy and operate and maintain the self-dedicated cloud artificial intelligence services.
Optionally, in this embodiment, the method for identifying a domain name system request may also be applied to a block chain application scenario including but not limited to.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.
The platform product service layer provides basic capability and an implementation framework of typical application, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.
Optionally, in this embodiment, the DNS request packet may include, but is not limited to, a request packet transmitted based on an https communication protocol, and may also include, but is not limited to, a request packet transmitted based on an SSL or TSL protocol.
Optionally, in this embodiment, the feature set of each DNS request packet in the DNS request packet sequence may include, but is not limited to, a size of each DNS request packet as a feature value of the DNS request packet, and further, the feature set of each DNS request packet sequence is formed.
For example, if the DNS request packet sequence includes i packets, the feature set of each DNS request packet sequence is obtained by obtaining the size of each packet in the i packets and using the size as a feature.
Optionally, in this embodiment, the target feature set may be obtained by N-gram based on the feature set of each DNS request packet sequence.
Optionally, in this embodiment, determining the target feature vector of the group of DNS request packet sequences according to the target feature set may include, but is not limited to, the following ways:
s1, calculating the occurrence frequency of different feature members in the target feature set;
and S2, determining the target feature vector according to the occurrence times of the different feature members.
The target feature vector may include, but is not limited to, a row vector, and the parameter of each column in the row vector is used to indicate the number of occurrences of the feature member in the target feature set.
The above is merely an example, and the present embodiment is not limited in any way.
Optionally, in this embodiment, the target classifier is configured to output a target classification result, where the target classification result is used to indicate whether the obtained group of DNS request packet sequences is a malicious DNS request packet sequence, and the target classifier may include, but is not limited to, a classifier designed based on an RF random forest algorithm.
It should be noted that, in the machine learning, the random forest algorithm is a classifier including a plurality of decision trees, and the output class is determined by the mode of the class output by the individual trees, in the embodiment, the random forest algorithm uses different features and data subsets in each decision tree to randomize data and features on a large number of decision trees, so as to solve the technical problem of overfitting training data in the related art. Further, but not limited to, using a ten-fold cross-validation method, in which samples of each class are divided into multiple disjoint partitions, to measure the generalization error of the classifier.
Fig. 3 is a schematic diagram of an optional domain name system request identification method according to an embodiment of the present invention, and as shown in fig. 3, the process includes the following steps:
s302, carrying out flow acquisition to obtain a group of domain name system DNS request data packet sequences;
s304, determining an encrypted DNS request data packet sequence from a group of domain name system DNS request data packet sequences;
s306, extracting features based on the size of the DNS request data packet sequence to obtain the target feature vector;
s308, inputting the trained DNS encryption request classifier;
s310, outputting whether the DNS encryption request is a malicious DNS request;
s312, blocking the DNS encryption request when it is determined that the DNS encryption request is a malicious DNS request, thereby completing identification of the DNS request packet and improving the identification efficiency of the DNS request packet.
According to the embodiment, a mode that a group of domain name system DNS request data packet sequences are obtained, the feature set of each DNS request data packet sequence in the group of DNS request data packet sequences is obtained, the target feature set is obtained, the target feature vector of the group of DNS request data packet sequences is determined according to the target feature set, the target feature vector is input into a target classifier, and the target classification result output by the target classifier is obtained is adopted, the group of DNS request data packets are obtained, the target feature set used for representing the group of DNS request data packets is extracted, the target feature vector is further determined, whether the group of DNS request data packets are malicious DNS request data packet sequences is identified through the classifier, the purpose that the DNS request data packets can be effectively identified is achieved, the technical effect of improving the identification efficiency of the DNS request data packets is achieved, and the technical problem that the encrypted DNS request data packets are difficult to effectively identify in the related technology is solved To give a title.
As an optional scheme, the obtaining a feature set of each DNS request packet sequence in the group of DNS request packet sequences to obtain a target feature set includes:
and extracting the features of each DNS request data packet sequence according to sliding windows with different lengths to obtain a plurality of feature sets of each DNS request data packet sequence, wherein the target feature set comprises the plurality of feature sets of each DNS request data packet sequence.
Optionally, in this embodiment, the sliding windows with different lengths may include, but are not limited to, being implemented based on an N-gram algorithm, performing a sliding window operation with a size of N on each DNS request packet sequence by byte, and forming a byte fragment sequence with a length of N. Using the Latin-word prefix, an n-gram of size 1 is referred to as a "uni-gram"; size 2 is then "bi-gram".
For example, taking the example that the group of DNS sequences includes two DNS sequences, and the first DNS request packet sequence includes i DNS request packets, P is adoptediFor indicating the size of the ith packet, the feature set for each DNS request packet sequence in the set of DNS request packet sequences may include, but is not limited to, the following set:
(P1、P2、P3、…Pi);
for example, the second DNS request packet sequence includes j DNS request packets, and Q is adoptedjFor representing the size of the jth packet, the feature set for each DNS request packet sequence in the set of DNS request packet sequences may include, but is not limited to, the following set:
(Q1、Q2、Q3、…Qj);
the following characteristic members can be obtained by adopting the N-gram model:
Uni-gram:(P1)、(P2)、(P3)、…、(Pi)、(Q1)、(Q2)、(Q3)、…(Qj);
Bi-gram:(P1,P2)、(P2,P3)、…、(Pi-1,Pi)、(Q1,Q2)、(Q2,Q3)、…(Qj-1,Qj);
the target feature set is a target feature set obtained by taking a Uni-gram and a Bi-gram from the feature set of each DNS request packet sequence in a group of DNS request packet sequences:
(P1)、(P2)、(P3)、…、(Pi)、(Q1)、(Q2)、(Q3)、…(Qj)、(P1,P2)、(P2,P3)、…、(Pi-1,Pi)、(Q1,Q2)、(Q2,Q3)、…(Qj-1,Qj)。
the above is merely an example, and the present embodiment is not limited in any way.
According to the embodiment, the characteristics of each DNS request data packet sequence are extracted according to the sliding windows with different lengths to obtain a plurality of characteristic sets of each DNS request data packet sequence, and the characteristics are extracted according to the sliding windows with different lengths to obtain the characteristic sets of each DNS request data packet sequence, so that the technical effect of improving the identification efficiency of the DNS request data packets is achieved, and the technical problem that the encrypted DNS request data packets are difficult to effectively identify in the related technology is solved.
As an optional scheme, the extracting features of each DNS request packet sequence according to sliding windows of different lengths to obtain a plurality of feature sets of each DNS request packet sequence includes:
in a case that the group of DNS request packet sequences includes K DNS request packet sequences, for each DNS request packet sequence, performing the following operation, where K is a natural number greater than 1:
obtaining the characteristics (P1, …, Pi, …, Pnj) of the jth DNS request packet sequence, wherein Pi represents the characteristics of the ith DNS request packet in the jth DNS request packet sequence, nj represents the number of DNS request packets in the jth DNS request packet sequence,
Figure 22012DEST_PATH_IMAGE001
Figure 586854DEST_PATH_IMAGE002
and performing sliding extraction on the features (P1, …, Pi, … and Pnj) according to M sliding windows with different lengths to obtain M feature sets of the jth DNS request packet sequence, wherein M is a natural number greater than 1.
Optionally, in this embodiment, the K may be obtained according to actual requirements, and the configuration of M to 1 or 2 may be included but not limited based on an N-gram model.
Optionally, in this embodiment, fig. 4 is a schematic diagram of another optional identification method for a domain name system request according to an embodiment of the present invention, and as shown in fig. 4, feature extraction is performed on the K DNS request packet sequences, and the obtained plurality of feature sets of each DNS request packet sequence may include, but are not limited to, the following:
the 1 st DNS request packet sequence includes n DNS request packets, Pi is used to represent the feature of the ith DNS request packet, and the n DNS request packets are extracted in a sliding manner according to the sliding window with the step size of 1, so that the 1 st feature set in the M feature sets may include, but is not limited to, the following:
(P1)、(P2)、…、(Pi)、…、(Pnj);
sliding extraction is performed on the n DNS request packets according to the sliding window with the step size of 2, and then the 2 nd feature set in the M feature sets may include, but is not limited to, the following:
(P1,P2)、(P2,P3)、…、(Pnj-1,Pnj);
through the embodiment, when a group of DNS request packet sequences includes K DNS request packet sequences, the following operation is performed for each DNS request packet sequence to obtain the features (P1, …, Pi, …, Pnj) of the jth DNS request packet sequence, and the features (P1, …, Pi, …, Pnj) are extracted in a sliding manner according to M sliding windows of different lengths to obtain M feature sets of the jth DNS request packet sequence.
As an alternative, sliding extraction is performed on the features (P1, …, Pi, …, Pnj) according to M sliding windows with different lengths, so as to obtain M feature sets of the jth DNS request packet sequence, where the M feature sets include:
performing sliding extraction on the features (P1, …, Pi, …, Pnj) according to a sliding window with the length of 1 to obtain a first feature set (P1),.., (Pi),.., (Pnj);
performing sliding extraction on the features (P1, …, Pi, …, Pnj) according to a sliding window with the length of 2 to obtain a second feature set (P1, P2), …, (Pnj-1, Pnj);
wherein the M feature sets include the first feature set and the second feature set.
Optionally, in this embodiment, the length of the sliding window may be adjusted according to actual conditions.
Optionally, in this embodiment, taking K =1 as an example, the target feature set may include, but is not limited to, the following:
(P1),...,(Pi),...,(Pnj)、(P1,P2),… ,(Pnj-1,Pnj);Pi) That is to sayAfter sliding extraction is performed according to a sliding window with the length of 1 and a sliding window with the length of 2, the target feature set comprises 2n-1 feature members.
By the embodiment, the characteristics (P1, …, Pi, … and Pnj) are extracted in a sliding mode according to the sliding window with the length of 1 to obtain the first characteristic set (P1), · (Pi), · (Pnj), and the characteristics (P1, …, Pi, … and Pnj) are extracted in a sliding mode according to the sliding window with the length of 2 to obtain the second characteristic sets (P1, P2), … and Pnj-1 and Pnj), so that the technical effect of improving the identification efficiency of the DNS request data packet can be achieved, and the technical problem that the encrypted DNS request data packet is difficult to identify effectively in the related technology is solved.
As an alternative, the obtaining the characteristics (P1, …, Pi, …, Pnj) of the jth DNS request packet sequence includes:
obtaining packet characteristics (p 1, …, pi, …, pnj) of each DNS request packet in the jth DNS request packet sequence;
determining characteristics (P1, …, Pi, …, Pnj) of a jth DNS request packet sequence according to the packet characteristics of each DNS request packet in the jth DNS request packet sequence and the transmission direction of each DNS request packet, wherein Pi = si × Pi, where Pi represents the packet characteristics of an ith DNS request packet in the jth DNS request packet sequence;
wherein si is 1 when the transmission direction of the ith DNS request data packet is from a client to a DNS resolver, and si is-1 when the transmission direction of the ith DNS request data packet is from the DNS resolver to the client; or if the transmission direction of the ith DNS request packet is from the client to the DNS resolver, si is-1, and if the transmission direction of the ith DNS request packet is from the DNS resolver to the client, si is 1.
Optionally, in this embodiment, since the characteristics of the DNS request packets in different transmission directions may be consistent in size, but the specific meanings represented by the DNS request packets are not completely the same, the extracted characteristics of the packets may be adjusted according to the transmission direction of the DNS request packets to determine the characteristics of the DNS request packet sequence.
For example, but not limited to, determining the characteristic (P1, …, Pi, …, Pnj) of the jth DNS request packet sequence as (P1, -P2, P3, -P4, …, Pnj) according to the packet characteristic of each DNS request packet in the jth DNS request packet sequence and the transmission direction of each DNS request packet, so as to identify the transmission direction of the DNS request packet corresponding to the packet characteristic based on the positive and negative directions of the packet characteristic.
By the embodiment, the technical effect of improving the identification efficiency of the DNS request packet can be achieved by obtaining the packet characteristics (P1, …, Pi, …, pnj) of each DNS request packet in the jth DNS request packet sequence and determining the characteristics (P1, …, Pi, …, Pnj) of each DNS request packet in the jth DNS request packet sequence according to the packet characteristics of each DNS request packet in the jth DNS request packet sequence and the transmission direction of each DNS request packet, thereby solving the technical problem that it is difficult to effectively identify the encrypted DNS request packet in the related art.
As an optional scheme, the obtaining the packet characteristics (p 1, …, pi, …, pnj) of each DNS request packet in the jth DNS request packet sequence includes:
and obtaining the packet size of each DNS request packet in the jth DNS request packet sequence to obtain characteristics (p 1, …, pi, … and pnj), wherein pi represents the packet size of the ith DNS request packet in the jth DNS request packet sequence.
Alternatively, in this embodiment, when the DNS request packet sequence is an encrypted DNS request packet sequence, the data recorded in the DNS request packet is encrypted, and therefore, the content specifically included in the DNS request packet cannot be acquired, but the size of the DNS request packet can be acquired, and therefore, the packet size of the DNS request packet is used as the characteristic of the DNS request packet, so that the purpose of quickly and effectively identifying the encrypted DNS request packet can be achieved.
By the embodiment, the mode of obtaining the packet size of each DNS request packet in the jth DNS request packet sequence to obtain the characteristics (p 1, …, pi, …, pnj) is adopted, so that the technical effect of improving the identification efficiency of the DNS request packet can be achieved, and the technical problem that the encrypted DNS request packet is difficult to be effectively identified in the related art is solved.
As an optional solution, the determining a target feature vector of the group of DNS request packet sequences according to the target feature set includes:
acquiring the number Q of different feature members in the target feature set and the occurrence frequency of the Q different feature members in the target feature set, wherein Q is a natural number greater than 1;
and determining the target feature vector as a Q-dimensional feature vector, wherein each vector member in the Q-dimensional feature vector is used for representing the number of times that a corresponding one of the Q different feature members appears in the target feature set.
Optionally, in this embodiment, the different feature members may include, but are not limited to, different packet sizes of the DNS request packets, or different transmission directions of the DNS request packets, or different packet sizes and different transmission directions of the DNS request packets.
For example, fig. 5 is a schematic diagram of an alternative domain name system request identification method according to an embodiment of the present invention, and as shown in fig. 5, the target feature set 502 includes the following contents:
P1=2、P2=3、P3=-3、P4=3、P5=3、P6=3
since P2= P4= P5= P6 ≠ P1 ≠ P3, and the different feature members include P1, P2, and P3, the number Q of different feature members in the target feature set is 3, and the number of occurrences of the packet feature corresponding to P1 is 1, the number of occurrences of the packet feature corresponding to P2 is 4, the number of occurrences of the packet feature corresponding to P3 is 1, and the target feature vector 504 is: the 3-dimensional feature vector of (1, 4, 1), the vector member 1 of the first column, the vector member 4 of the second column, and the vector member 1 of the third column correspond to the number of times that one feature member appears in the target feature set, in other words, the vector member 1 in the first column in the target feature vector 504 indicates that the number of times that the packet feature (2) appears is 1, and the vector member 4 in the second column indicates that the number of times that the packet feature (3) appears is 4. Vector member 1, located in the third column, indicates that the packet characteristic (-3) occurs 1 times.
The above is merely an example, and the present embodiment is not limited in any way.
According to the embodiment, the technical problem that the encrypted DNS request data packet is difficult to effectively identify in the related technology is solved by adopting a mode of acquiring the number Q of different feature members in the target feature set and the occurrence frequency of the Q different feature members in the target feature set and determining the target feature vector as the Q-dimensional feature vector.
As an optional scheme, the acquiring the number Q of different feature members in the target feature set and the number of times that the Q different feature members appear in the target feature set includes:
and under the condition that the group of DNS request data packet sequences comprises K DNS request data packet sequences, the target feature set comprises K × M feature sets of the K DNS request data packet sequences, and the M feature sets of each DNS request data packet sequence are feature sets obtained by extracting features of each DNS request data packet sequence according to M sliding windows with different lengths, acquiring the number Q of different feature members in the K × M feature sets and the occurrence times of the Q different feature members in the K × M feature sets, wherein K is a natural number greater than 1, and M is a natural number greater than 1.
Optionally, in this embodiment, the method may include, but is not limited to, acquiring K × M feature sets obtained in K DNS request packet sequences, and further acquiring the number Q of different feature members in the K × M feature sets and the number of times that the Q different feature members appear in the K × M feature sets.
For example, taking K =2 and M =2 as an example, each DNS request packet sequence includes 3 DNS request packets, and the 3 DNS request packet sequence may include, but is not limited to, the following 4 feature sets:
(P1)、(P2)、(P3);
(Q1)、(Q2)、(Q3);
(P1,P2)、(P2,P3);
(Q1,Q2)、(Q2,Q3);
performing splicing operation on the 4 feature sets to obtain the following feature sets:
(P1), (P2), (P3), (Q1), (Q2), (Q3), (P1, P2), (P2, P3), (Q1, Q2), (Q2, Q3), each of which corresponds to a feature value;
taking (P1) = (P3) = (Q3), (P2) = (Q2), and (P1) ≠ Q1), (P2) ≠ Q1), (P1) ≠ P2, and (P1, P2), (P2, P3), (Q1, Q2), and (Q2, Q3) all differ from each other, the number of occurrences of the Q different feature members in the K × M feature sets includes 3 corresponding to the value of P1, 2 corresponding to the value of P2, 1 corresponding to the value of Q1, and 1 corresponding to the values of (P1, P2), (P2, P3), (Q1, Q2), and (Q2, Q3), respectively, and the number of occurrences is 1, in this case, the target feature vector is (3, 2, 1, 1, 1, 1, respectively).
According to the embodiment, under the condition that a group of DNS request data packet sequences comprises K DNS request data packet sequences, a target feature set comprises K multiplied by M feature sets of the K DNS request data packet sequences, and the M feature sets of each DNS request data packet sequence are feature sets obtained by extracting features of each DNS request data packet sequence according to M sliding windows with different lengths, the number Q of different feature members in the K multiplied by M feature sets and the occurrence frequency of the Q different feature members in the K multiplied by M feature sets are obtained, so that feature vectors used for inputting a target classifier to obtain a target classification result are obtained, and further, the technical problem that the encrypted DNS request data packets are difficult to effectively identify in the related technology is solved.
As an optional scheme, the inputting the target feature vector into a target classifier to obtain a target classification result output by the target classifier includes:
under the condition that the target classifier comprises S target decision trees, determining a decision result according to the target feature vector through each target decision tree in the S target decision trees to obtain S decision results, wherein each decision result in the S decision results is used for indicating whether the group of DNS request data packet sequences are malicious DNS request data packet sequences or not, and S is a natural number larger than 1;
and determining the target classification result according to the S decision results.
Optionally, in this embodiment, the target classifier may include, but is not limited to, a target classifier obtained based on random forest model training, the input of the target classifier is the target feature vector, and the output of the target classifier is a result corresponding to a mode of S decision results determined by the S target decision trees.
For example, fig. 6 is a schematic diagram of another optional method for identifying a domain name system request according to an embodiment of the present invention, as shown in fig. 6, taking S equal to 3 as an example, the method may include, but is not limited to, the following descriptions:
s1, acquiring a target feature vector;
s2, inputting the target feature vector into a target classifier;
s3, respectively inputting the target feature vectors into each decision tree in the target classifier;
s4, obtaining the classification result output by each decision tree;
and S5, taking the classification result with the largest occurrence frequency in the classification results output by each decision tree as the target classification result.
The above is merely an example, and the present embodiment is not limited in any way.
According to the embodiment, under the condition that the target classifier comprises S target decision trees, a decision result is determined according to the target feature vector through each target decision tree in the S target decision trees to obtain S decision results in total, and the target classification result is determined according to the S decision results, so that the technical problem that the encrypted DNS request data packet is difficult to effectively identify in the related technology is solved.
As an optional scheme, the determining the target classification result according to the S decision results includes:
determining the target classification result as the group of DNS request packet sequences being malicious DNS request packet sequences when the number of first decision results in the S decision results is greater than the number of second decision results, where the first decision result is used to indicate that the group of DNS request packet sequences is a malicious DNS request packet sequence, and the second decision result is used to indicate that the group of DNS request packet sequences is not a malicious DNS request packet sequence;
determining the target classification result as the group of DNS request packet sequence not being a malicious DNS request packet sequence if the number of the first decision results in the S decision results is less than the number of the second decision results.
Optionally, in this embodiment, the determining the target classification result according to the S decision results may include, but is not limited to, determining in a binary manner, for example, outputting "1" in a case that the decision result is used to indicate that the group of DNS request packet sequences is a malicious DNS request packet sequence, and outputting "0" in a case that the decision result is used to indicate that the group of DNS request packet sequences is not a malicious DNS request packet sequence, so as to finally take a decision result with a larger number of the first decision result and the second decision result as the target classification result according to the number of the decision results, that is, the number of "1" and "0" in the output result.
As an optional scheme, the determining the target classification result according to the S decision results includes:
determining the target classification result as a malicious DNS request packet sequence of which the group of DNS request packet sequences is of a target malicious type when the number of third decision results in the S decision results is the largest, wherein the third decision result is used to indicate that the group of DNS request packet sequences is of the target malicious type;
each decision result in the S decision results is used to indicate whether the group of DNS request packet sequences is a malicious DNS request packet sequence, and when the group of DNS request packet sequences is a malicious DNS request packet sequence, further indicates a malicious type to which the group of DNS request packet sequences belongs.
Optionally, in this embodiment, each of the S decision results may further include, but is not limited to, a category and a probability for indicating whether a group of DNS request packet sequences is a malicious DNS request packet sequence, and in the case of a malicious DNS request packet sequence, the category and the probability of the malicious DNS request packet sequence are the malicious category, where the probability indicates the probability that the DNS request packet sequence is a corresponding malicious category.
As an optional solution, the method further comprises:
determining a target confidence of the target classification result according to the target feature vector and a training feature vector when the target classification result indicates that the group of DNS request packet sequences is a malicious DNS request packet sequence of a target malicious type, wherein the training feature vector is used for indicating the occurrence times of different feature members in a sample feature set, the sample feature set is a set determined according to a feature set of each sample DNS request packet sequence in the group of sample DNS request packet sequences, and the group of sample DNS request packet sequences are actually the malicious DNS request packet sequence of the target malicious type;
determining the group of DNS request data packet sequences as malicious DNS request data packet sequences of the target malicious type under the condition that the target confidence is larger than a preset confidence threshold;
determining the set of DNS request packet sequences as DNS request packet sequences of unknown class if the target confidence is less than the confidence threshold.
Optionally, in this embodiment, the training feature vector includes, but is not limited to, a training feature vector labeled as a malicious DNS request packet sequence and labeled as a malicious DNS request packet sequence of a target malicious type.
Optionally, in this embodiment, the confidence threshold may include, but is not limited to, being preset by a system or a server.
According to the embodiment, the size of the target confidence and the confidence threshold is compared to determine whether a group of DNS request data packet sequences are malicious DNS request data packet sequences of a target malicious type, so that the technical problem that the encrypted DNS request data packets are difficult to effectively identify in the related technology is solved.
As an optional solution, the determining the target confidence of the target classification result according to the target feature vector and the training feature vector includes:
determining a target confidence for the target classification result by:
Figure 390862DEST_PATH_IMAGE003
wherein the content of the first and second substances,
Figure 854204DEST_PATH_IMAGE004
representing the target confidence of the target classification result, Q representing the number of different feature members in the target feature set, and the target feature vector being
Figure 651128DEST_PATH_IMAGE005
Figure 883526DEST_PATH_IMAGE006
Representing the number of times of the ith different feature member in the target feature set appearing in the target feature set, wherein the training feature vector is
Figure 124015DEST_PATH_IMAGE007
Figure 543495DEST_PATH_IMAGE008
Representing an average number of occurrences of an ith different feature member of the sample feature set in the sample feature set.
Optionally, in this embodiment, it is assumed that the feature set extracted from the DNS request packet to be currently detected
Figure 160421DEST_PATH_IMAGE009
After the input of the target classifier, the output of the classification result A is assumed
Figure 231014DEST_PATH_IMAGE010
,
Figure 907983DEST_PATH_IMAGE011
Representing the average number of occurrences of the ith different feature member in the sample feature set at a confidence level
Figure 814759DEST_PATH_IMAGE004
If the value of D is greater than the confidence threshold, determining that D belongs to the A category, otherwise, determining that D does not belong to the classifiable category, namely the unknown category.
As an optional solution, the method further comprises:
repeatedly executing the following steps until the target classifier is determined:
obtaining a group of sample DNS request data packet sequences, wherein each sample DNS request data packet sequence comprises a group of encrypted DNS request data packets;
acquiring a feature set of each sample DNS request data packet sequence in the group of sample DNS request data packet sequences to obtain a sample feature set;
determining a sample feature vector of the group of sample DNS request data packet sequences according to the sample feature set, wherein the sample feature vector is used for representing the times of occurrence of different feature members in the sample feature set;
inputting the sample feature vectors into a sample classifier to obtain a sample classification result output by the sample classifier, wherein the sample classification result is used for indicating whether the group of sample DNS request data packet sequences are malicious DNS request data packet sequences or not, and indicating a malicious type to which the group of sample DNS request data packet sequences belong when the group of sample DNS request data packet sequences are the malicious DNS request data packet sequences;
determining a function value of a preset target loss function according to the sample classification result and an actual classification result of the group of sample DNS request data packet sequences, wherein the actual classification result is used for indicating whether the group of sample DNS request data packet sequences are actually malicious DNS request data packet sequences or not and indicating a malicious type to which the group of sample DNS request data packet sequences belong when the group of sample DNS request data packet sequences are actually malicious DNS request data packet sequences;
under the condition that the function value of the target loss function does not meet a preset loss condition, adjusting parameters in the sample classifier;
and under the condition that the function value of the target loss function meets the loss condition, finishing the training of the sample classifier, and determining the target classifier as the sample classifier when the training is finished.
Optionally, in the present embodiment, the above target loss function may include, but is not limited to, a 0-1 loss function, an absolute value loss function, a log-log loss function, a square loss function, an exponential loss function, a Hinge loss function, a cross-entropy loss function, and the like.
Optionally, in this embodiment, the above-mentioned loss condition may include, but is not limited to, convergence of a loss function, and a preset loss condition that can indicate that the training of the classifier is completed currently or in the future.
Fig. 7 is a schematic diagram of an alternative domain name system request identification method according to an embodiment of the present invention, and as shown in fig. 7, the training and using process of the target classifier may include, but is not limited to, the following:
training process of the sample classifier:
s702, acquiring full flow;
s704, filtering based on the target IP and the port;
s706, determining DNS encrypted traffic (corresponding to the group of sample DNS request data packet sequences);
s708, performing feature extraction to obtain a sample feature vector;
and S710, inputting the sample feature vector into a sample classifier, and training the sample classifier by using a ten-fold cross validation and an RF random forest algorithm to obtain a target classifier.
The use process of the target classifier comprises the following steps:
s712, obtaining the flow to be measured;
s714, extracting features to obtain a target feature vector;
s716, inputting a target classifier;
s718, performing classification verification, and if the verification result indicates passing, executing the step S720, otherwise, executing the step S722;
s720, outputting a result including a malicious DNS request data packet sequence with the flow to be detected as the target type;
s722, the output result includes a DNS request packet sequence with the traffic to be measured as uncertain classification.
Specifically, the following may be included but not limited to:
firstly, in the training process of a classifier, DNS traffic is filtered from full traffic by using a target IP and a port, then DNS encrypted traffic characteristic extraction is carried out, finally the classifier is trained by using an RF random forest algorithm, test contingency is avoided by using cross validation, and finally the classifier is generated.
Secondly, in the judgment process of the classifier, extracting the characteristics of the DNS encrypted flow to be detected and inputting the extracted characteristics into the classifier, outputting the result by the classifier, wherein the most class in the output classification result is the target classification result output by the target classifier. Meanwhile, in order to prevent the input actual class from not belonging to the classifiable range of the classifier, classification verification is continued, the result is output only when the classification verification is passed, and otherwise, the input is regarded as uncertain classification.
By the embodiment, the detection of the encrypted DNS request data packet sequence by adopting the method has improvement significance for a plurality of scenes, the coverage rate of application scenes can be improved, and further, the technical problem that the encrypted DNS request data packet is difficult to effectively identify in the related technology is solved.
The present embodiment is further explained below with reference to specific examples:
fig. 8 is a schematic diagram of yet another optional identification method for a domain name system request according to an embodiment of the present invention, as shown in fig. 8, in an application scenario where a DNS resolution result is required, which may include but is not limited to using a website ranking of DNS resolution to assist in determining whether http encrypted traffic is malicious, and for an encrypted DNS query, using the above identification method for a domain name system request may assist in obtaining a website of the DNS query, and further, determining whether the DNS request packet sequence is a malicious DNS request packet sequence.
Specifically, the following steps may be included, but not limited to:
s802, acquiring HTTPs flow;
s804, determining DNS flow (corresponding to the DNS request data packet sequence) according to the destination ip and the time information recorded in the HTTPs flow;
s806, querying whether the DNS request packet sequence is an encrypted DNS request packet sequence, and after determining that the DNS request packet sequence is not an encrypted DNS request packet sequence, performing step S812;
s808, after determining that the DNS request packet sequence is an encrypted DNS request packet sequence, extracting feature information of the DNS request packet sequence, and inputting a trained DNS encrypted traffic classifier (corresponding to the target classifier);
s810, determining the website type corresponding to the DNS request data packet sequence based on the result of the target classifier, and executing the step S812;
s812, website ranking query;
s814, perform other processes.
According to the embodiment, the access site of the current user is identified by generating the feature set corresponding to the encrypted DNS request data packet sequence, and the mode in the DNS flow request-response size pair and the local sequence of the data packet size sequence are captured by using the n-gram feature, so that a specific access website is marked. The classification model is finally trained by utilizing the characteristics to obtain the classification model for identifying the DNS encrypted flow fingerprint, and various classifiers are selected according to different scenes to be suitable for multiple scenes, so that the technical effect of improving the coverage rate of application scenes can be achieved, and further, the technical problem that the encrypted DNS request data packet is difficult to effectively identify in the related technology is solved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
According to another aspect of the embodiment of the present invention, there is also provided a domain name system request identification apparatus for implementing the above domain name system request identification method. As shown in fig. 9, the apparatus includes:
a first obtaining module 902, configured to obtain a group of DNS request packet sequences of a domain name system, where each DNS request packet sequence includes a group of encrypted DNS request packets;
a second obtaining module 904, configured to obtain a feature set of each DNS request packet sequence in the group of DNS request packet sequences, to obtain a target feature set;
a determining module 906, configured to determine, according to the target feature set, a target feature vector of the group of DNS request packet sequences, where the target feature vector is used to indicate the number of occurrences of different feature members in the target feature set;
an input module 908, configured to input the target feature vector into a target classifier, and obtain a target classification result output by the target classifier, where the target classification result is used to indicate whether the group of DNS request packet sequences is a malicious DNS request packet sequence.
As an optional scheme, the second obtaining module 904 includes:
and the extracting unit is used for extracting the features of each DNS request data packet sequence according to sliding windows with different lengths to obtain a plurality of feature sets of each DNS request data packet sequence, wherein the target feature set comprises the plurality of feature sets of each DNS request data packet sequence.
As an optional scheme, the extracting unit is configured to extract the features of each DNS request packet sequence according to sliding windows of different lengths in the following manner, so as to obtain a plurality of feature sets of each DNS request packet sequence:
in a case that the group of DNS request packet sequences includes K DNS request packet sequences, for each DNS request packet sequence, performing the following operation, where K is a natural number greater than 1:
obtaining the characteristics (P1, …, Pi, …, Pnj) of the jth DNS request packet sequence, wherein Pi represents the characteristics of the ith DNS request packet in the jth DNS request packet sequence, nj represents the number of DNS request packets in the jth DNS request packet sequence,
Figure 704218DEST_PATH_IMAGE001
Figure 911208DEST_PATH_IMAGE002
and performing sliding extraction on the features (P1, …, Pi, … and Pnj) according to M sliding windows with different lengths to obtain M feature sets of the jth DNS request packet sequence, wherein M is a natural number greater than 1.
As an alternative, the extracting unit is configured to perform sliding extraction on the features (P1, …, Pi, …, Pnj) according to M sliding windows with different lengths, so as to obtain M feature sets of the jth DNS request packet sequence:
performing sliding extraction on the features (P1, …, Pi, …, Pnj) according to a sliding window with the length of 1 to obtain a first feature set (P1),.., (Pi),.., (Pnj);
performing sliding extraction on the features (P1, …, Pi, …, Pnj) according to a sliding window with the length of 2 to obtain a second feature set (P1, P2), …, (Pnj-1, Pnj);
wherein the M feature sets include the first feature set and the second feature set.
As an alternative, the extracting unit is configured to obtain the characteristics (P1, …, Pi, …, Pnj) of the jth DNS request packet sequence by:
obtaining packet characteristics (p 1, …, pi, …, pnj) of each DNS request packet in the jth DNS request packet sequence;
determining characteristics (P1, …, Pi, …, Pnj) of a jth DNS request packet sequence according to the packet characteristics of each DNS request packet in the jth DNS request packet sequence and the transmission direction of each DNS request packet, wherein Pi = si × Pi, where Pi represents the packet characteristics of an ith DNS request packet in the jth DNS request packet sequence;
wherein si is 1 when the transmission direction of the ith DNS request data packet is from a client to a DNS resolver, and si is-1 when the transmission direction of the ith DNS request data packet is from the DNS resolver to the client; or if the transmission direction of the ith DNS request packet is from the client to the DNS resolver, si is-1, and if the transmission direction of the ith DNS request packet is from the DNS resolver to the client, si is 1.
As an optional solution, the extracting unit is configured to obtain the packet characteristics (p 1, …, pi, …, pnj) of each DNS request packet in the jth DNS request packet sequence by:
and obtaining the packet size of each DNS request packet in the jth DNS request packet sequence to obtain characteristics (p 1, …, pi, … and pnj), wherein pi represents the packet size of the ith DNS request packet in the jth DNS request packet sequence.
As an optional solution, the determining module 906 includes:
a first obtaining unit, configured to obtain the number Q of different feature members in the target feature set and the number of times that the Q different feature members appear in the target feature set, where Q is a natural number greater than 1;
a first determining unit, configured to determine the target feature vector as a Q-dimensional feature vector, where each vector member in the Q-dimensional feature vector is used to represent a number of times that a corresponding one of the Q different feature members appears in the target feature set.
As an optional scheme, the first obtaining unit is configured to obtain the number Q of different feature members in the target feature set and the number of times that the Q different feature members appear in the target feature set by:
and under the condition that the group of DNS request data packet sequences comprises K DNS request data packet sequences, the target feature set comprises K × M feature sets of the K DNS request data packet sequences, and the M feature sets of each DNS request data packet sequence are feature sets obtained by extracting features of each DNS request data packet sequence according to M sliding windows with different lengths, acquiring the number Q of different feature members in the K × M feature sets and the occurrence times of the Q different feature members in the K × M feature sets, wherein K is a natural number greater than 1, and M is a natural number greater than 1.
As an alternative, the input module 908 includes:
a second determining unit, configured to determine, by each target decision tree of the S target decision trees, a decision result according to the target feature vector under the condition that the target classifier includes S target decision trees, to obtain S decision results, where each decision result of the S decision results is used to indicate whether the group of DNS request packet sequences is a malicious DNS request packet sequence, and S is a natural number greater than 1;
and the third determining unit is used for determining the target classification result according to the S decision results.
As an optional solution, the third determining unit is configured to determine the target classification result according to the S decision results by:
determining the target classification result as the group of DNS request packet sequences being malicious DNS request packet sequences when the number of first decision results in the S decision results is greater than the number of second decision results, where the first decision result is used to indicate that the group of DNS request packet sequences is a malicious DNS request packet sequence, and the second decision result is used to indicate that the group of DNS request packet sequences is not a malicious DNS request packet sequence;
determining the target classification result as the group of DNS request packet sequence not being a malicious DNS request packet sequence if the number of the first decision results in the S decision results is less than the number of the second decision results.
As an optional solution, the third determining unit is configured to determine the target classification result according to the S decision results by:
determining the target classification result as a malicious DNS request packet sequence of which the group of DNS request packet sequences is of a target malicious type when the number of third decision results in the S decision results is the largest, wherein the third decision result is used to indicate that the group of DNS request packet sequences is of the target malicious type;
each decision result in the S decision results is used to indicate whether the group of DNS request packet sequences is a malicious DNS request packet sequence, and when the group of DNS request packet sequences is a malicious DNS request packet sequence, further indicates a malicious type to which the group of DNS request packet sequences belongs.
As an optional solution, the apparatus is further configured to:
determining a target confidence of the target classification result according to the target feature vector and a training feature vector when the target classification result indicates that the group of DNS request packet sequences is a malicious DNS request packet sequence of a target malicious type, wherein the training feature vector is used for indicating the occurrence times of different feature members in a sample feature set, the sample feature set is a set determined according to a feature set of each sample DNS request packet sequence in the group of sample DNS request packet sequences, and the group of sample DNS request packet sequences are actually the malicious DNS request packet sequence of the target malicious type;
determining the group of DNS request data packet sequences as malicious DNS request data packet sequences of the target malicious type under the condition that the target confidence is larger than a preset confidence threshold;
determining the set of DNS request packet sequences as DNS request packet sequences of unknown class if the target confidence is less than the confidence threshold.
As an alternative, the apparatus is configured to determine the target confidence of the target classification result according to the target feature vector and the training feature vector by:
determining a target confidence for the target classification result by:
Figure 742767DEST_PATH_IMAGE003
wherein the content of the first and second substances,
Figure 136839DEST_PATH_IMAGE004
representing the target confidence of the target classification result, Q representing the number of different feature members in the target feature set, and the target feature vector being
Figure 95568DEST_PATH_IMAGE005
Figure 625906DEST_PATH_IMAGE006
Representing the number of times of the ith different feature member in the target feature set appearing in the target feature set, wherein the training feature vector is
Figure 644678DEST_PATH_IMAGE007
Figure 775314DEST_PATH_IMAGE008
Representing an average number of occurrences of an ith different feature member of the sample feature set in the sample feature set.
As an optional solution, the apparatus is further configured to:
repeatedly executing the following steps until the target classifier is determined:
obtaining a group of sample DNS request data packet sequences, wherein each sample DNS request data packet sequence comprises a group of encrypted DNS request data packets;
acquiring a feature set of each sample DNS request data packet sequence in the group of sample DNS request data packet sequences to obtain a sample feature set;
determining a sample feature vector of the group of sample DNS request data packet sequences according to the sample feature set, wherein the sample feature vector is used for representing the times of occurrence of different feature members in the sample feature set;
inputting the sample feature vectors into a sample classifier to obtain a sample classification result output by the sample classifier, wherein the sample classification result is used for indicating whether the group of sample DNS request data packet sequences are malicious DNS request data packet sequences or not, and indicating a malicious type to which the group of sample DNS request data packet sequences belong when the group of sample DNS request data packet sequences are the malicious DNS request data packet sequences;
determining a function value of a preset target loss function according to the sample classification result and an actual classification result of the group of sample DNS request data packet sequences, wherein the actual classification result is used for indicating whether the group of sample DNS request data packet sequences are actually malicious DNS request data packet sequences or not and indicating a malicious type to which the group of sample DNS request data packet sequences belong when the group of sample DNS request data packet sequences are actually malicious DNS request data packet sequences;
under the condition that the function value of the target loss function does not meet a preset loss condition, adjusting parameters in the sample classifier;
and under the condition that the function value of the target loss function meets the loss condition, finishing the training of the sample classifier, and determining the target classifier as the sample classifier when the training is finished.
According to another aspect of the embodiment of the present invention, there is also provided an electronic device for implementing the above method for identifying a domain name system request, where the electronic device may be a terminal device or a server shown in fig. 1. The present embodiment takes the electronic device as a server as an example for explanation. As shown in fig. 10, the electronic device comprises a memory 1002 and a processor 1004, the memory 1002 having stored therein a computer program, the processor 1004 being arranged to execute the steps of any of the method embodiments described above by means of the computer program.
Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring a group of domain name system DNS request data packet sequences, wherein each DNS request data packet sequence comprises a group of encrypted DNS request data packets;
s2, acquiring a feature set of each DNS request packet sequence in a group of DNS request packet sequences to obtain a target feature set;
s3, determining a target feature vector of a group of DNS request data packet sequences according to the target feature set, wherein the target feature vector is used for representing the occurrence times of different feature members in the target feature set;
and S4, inputting the target feature vector into the target classifier to obtain a target classification result output by the target classifier, wherein the target classification result is used for indicating whether a group of DNS request data packet sequences are malicious DNS request data packet sequences.
Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 10 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 10 is a diagram illustrating a structure of the electronic device. For example, the electronics may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.
The memory 1002 may be configured to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for identifying a domain name system request in the embodiment of the present invention, and the processor 1004 executes various functional applications and data processing by running the software programs and modules stored in the memory 1002, that is, implementing the above-mentioned method for identifying a domain name system request. The memory 1002 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1002 may further include memory located remotely from the processor 1004, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1002 may be, but is not limited to, specifically used for DNS request packets and other information. As an example, as shown in fig. 10, the memory 1002 may include, but is not limited to, the first obtaining module 902, the second obtaining module 904, the determining module 906 and the inputting module 908 of the domain name system requested recognition device. In addition, the present invention may further include, but is not limited to, other module units in the above domain name system request identification apparatus, which is not described in detail in this example.
Optionally, the above-mentioned transmission device 1006 is used for receiving or sending data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1006 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices so as to communicate with the internet or a local area Network. In one example, the transmission device 1006 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In addition, the electronic device further includes: a display 1008 for displaying the DNS request packet; and a connection bus 1010 for connecting the respective module parts in the above-described electronic apparatus.
In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the methods provided in the various alternative implementations of the identifying aspect of the domain name system request described above. Wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:
s1, acquiring a group of domain name system DNS request data packet sequences, wherein each DNS request data packet sequence comprises a group of encrypted DNS request data packets;
s2, acquiring a feature set of each DNS request packet sequence in a group of DNS request packet sequences to obtain a target feature set;
s3, determining a target feature vector of a group of DNS request data packet sequences according to the target feature set, wherein the target feature vector is used for representing the occurrence times of different feature members in the target feature set;
and S4, inputting the target feature vector into the target classifier to obtain a target classification result output by the target classifier, wherein the target classification result is used for indicating whether a group of DNS request data packet sequences are malicious DNS request data packet sequences.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (15)

1. A method for identifying a domain name system request, comprising:
acquiring a group of domain name system DNS request data packet sequences, wherein each DNS request data packet sequence comprises a group of encrypted DNS request data packets;
acquiring a feature set of each DNS request data packet sequence in the group of DNS request data packet sequences to obtain a target feature set;
determining a target feature vector of the group of DNS request data packet sequences according to the target feature set, wherein the target feature vector is used for representing the occurrence times of different feature members in the target feature set;
and inputting the target feature vector into a target classifier to obtain a target classification result output by the target classifier, wherein the target classification result is used for indicating whether the group of DNS request data packet sequences are malicious DNS request data packet sequences.
2. The method of claim 1, wherein the obtaining a set of features for each DNS request packet sequence in the set of DNS request packet sequences to obtain a target set of features comprises:
and extracting the features of each DNS request data packet sequence according to sliding windows with different lengths to obtain a plurality of feature sets of each DNS request data packet sequence, wherein the target feature set comprises the plurality of feature sets of each DNS request data packet sequence.
3. The method of claim 2, wherein the extracting the features of each DNS request packet sequence according to the sliding windows with different lengths to obtain a plurality of feature sets of each DNS request packet sequence comprises:
in a case that the group of DNS request packet sequences includes K DNS request packet sequences, for each DNS request packet sequence, performing the following operation, where K is a natural number greater than 1:
obtaining characteristics (P) of jth DNS request data packet sequence1,…,Pi,…,Pnj) Wherein P isiIndicating the characteristics of the ith DNS request packet in the jth DNS request packet sequence, nj indicating the number of DNS request packets in the jth DNS request packet sequence,
Figure 553265DEST_PATH_IMAGE001
Figure 939247DEST_PATH_IMAGE002
for said features (P) according to M sliding windows of different lengths1,…,Pi,…,Pnj) And performing sliding extraction to obtain M feature sets of the jth DNS request data packet sequence, wherein M is a natural number greater than 1.
4. Method according to claim 3, characterized in that said features (P) are paired according to M sliding windows of different lengths1,…,Pi,…,Pnj) Performing sliding extraction to obtain M feature sets of the jth DNS request packet sequence, including:
according to a sliding window of length 1,for the characteristic (P)1,…,Pi,…,Pnj) Performing sliding extraction to obtain a first characteristic set (P)1),...,(Pi),...,(Pnj);
According to a sliding window of length 2, for said feature (P)1,…,Pi,…,Pnj) Performing sliding extraction to obtain a second feature set (P)1,P2),… ,(Pnj-1,Pnj);
Wherein the M feature sets include the first feature set and the second feature set.
5. Method according to claim 3, characterized in that said obtaining of the characteristic (P) of the jth DNS request packet sequence1,…,Pi,…,Pnj) The method comprises the following steps:
obtaining a packet characteristic (p) of each DNS request packet in the jth DNS request packet sequence1,…,pi,…,pnj);
Determining the characteristic (P) of the jth DNS request data packet sequence according to the data packet characteristic of each DNS request data packet in the jth DNS request data packet sequence and the transmission direction of each DNS request data packet1,…,Pi,…,Pnj) Wherein P isi=si×piWherein p isiA packet characteristic representing an ith DNS request packet in the jth DNS request packet sequence;
wherein, in the case that the transmission direction of the ith DNS request data packet is from the client to the DNS resolver, si1, in the case that the transmission direction of the ith DNS request data packet is from the DNS resolver to the client, siIs-1; or, in the case that the transmission direction of the ith DNS request data packet is from the client to the DNS resolver, siIs-1, in case that the transmission direction of the ith DNS request packet is from the DNS resolver to the client, siIs 1.
6. The method of claim 5, wherein the obtaining of the packet characteristic (p) of each DNS request packet in the jth DNS request packet sequence1,…,pi,…,pnj) The method comprises the following steps:
obtaining the packet size of each DNS request packet in the jth DNS request packet sequence to obtain the characteristic (p)1,…,pi,…,pnj) Wherein p isiIndicating the packet size of the ith DNS request packet in the jth DNS request packet sequence.
7. The method of claim 1, wherein determining the target feature vector for the set of sequences of DNS request packets based on the target feature set comprises:
acquiring the number Q of different feature members in the target feature set and the occurrence frequency of the Q different feature members in the target feature set, wherein Q is a natural number greater than 1;
and determining the target feature vector as a Q-dimensional feature vector, wherein each vector member in the Q-dimensional feature vector is used for representing the number of times that a corresponding one of the Q different feature members appears in the target feature set.
8. The method of claim 7, wherein the obtaining the number Q of different feature members in the target feature set and the number of times the Q different feature members appear in the target feature set comprises:
and under the condition that the group of DNS request data packet sequences comprises K DNS request data packet sequences, the target feature set comprises K × M feature sets of the K DNS request data packet sequences, and the M feature sets of each DNS request data packet sequence are feature sets obtained by extracting features of each DNS request data packet sequence according to M sliding windows with different lengths, acquiring the number Q of different feature members in the K × M feature sets and the occurrence times of the Q different feature members in the K × M feature sets, wherein K is a natural number greater than 1, and M is a natural number greater than 1.
9. The method of claim 1, wherein the inputting the target feature vector into a target classifier to obtain a target classification result output by the target classifier comprises:
under the condition that the target classifier comprises S target decision trees, determining a decision result according to the target feature vector through each target decision tree in the S target decision trees to obtain S decision results, wherein each decision result in the S decision results is used for indicating whether the group of DNS request data packet sequences are malicious DNS request data packet sequences or not, and S is a natural number larger than 1;
and determining the target classification result according to the S decision results.
10. The method of claim 9, wherein determining the target classification result according to the S decision results comprises:
determining the target classification result as the group of DNS request packet sequences being malicious DNS request packet sequences when the number of first decision results in the S decision results is greater than the number of second decision results, where the first decision result is used to indicate that the group of DNS request packet sequences is a malicious DNS request packet sequence, and the second decision result is used to indicate that the group of DNS request packet sequences is not a malicious DNS request packet sequence;
determining the target classification result as the group of DNS request packet sequence not being a malicious DNS request packet sequence if the number of the first decision results in the S decision results is less than the number of the second decision results.
11. The method of claim 9, wherein determining the target classification result according to the S decision results comprises:
determining the target classification result as a malicious DNS request packet sequence of which the group of DNS request packet sequences is of a target malicious type when the number of third decision results in the S decision results is the largest, wherein the third decision result is used to indicate that the group of DNS request packet sequences is of the target malicious type;
each decision result in the S decision results is used to indicate whether the group of DNS request packet sequences is a malicious DNS request packet sequence, and when the group of DNS request packet sequences is a malicious DNS request packet sequence, further indicates a malicious type to which the group of DNS request packet sequences belongs.
12. The method according to any one of claims 1 to 11, further comprising:
determining a target confidence of the target classification result according to the target feature vector and a training feature vector when the target classification result indicates that the group of DNS request packet sequences is a malicious DNS request packet sequence of a target malicious type, wherein the training feature vector is used for indicating the occurrence times of different feature members in a sample feature set, the sample feature set is a set determined according to a feature set of each sample DNS request packet sequence in the group of sample DNS request packet sequences, and the group of sample DNS request packet sequences are actually the malicious DNS request packet sequence of the target malicious type;
determining the group of DNS request data packet sequences as malicious DNS request data packet sequences of the target malicious type under the condition that the target confidence is larger than a preset confidence threshold;
determining the set of DNS request packet sequences as DNS request packet sequences of unknown class if the target confidence is less than the confidence threshold.
13. The method of claim 12, wherein determining the target confidence of the target classification result based on the target feature vector and the training feature vector comprises:
determining a target confidence for the target classification result by:
Figure 187826DEST_PATH_IMAGE003
wherein the content of the first and second substances,
Figure 566723DEST_PATH_IMAGE004
representing the target confidence of the target classification result, Q representing the number of different feature members in the target feature set, and the target feature vector being
Figure 849937DEST_PATH_IMAGE005
Figure 406820DEST_PATH_IMAGE006
Representing the number of times of the ith different feature member in the target feature set appearing in the target feature set, wherein the training feature vector is
Figure 142695DEST_PATH_IMAGE007
Figure 810437DEST_PATH_IMAGE008
Representing an average number of occurrences of an ith different feature member of the sample feature set in the sample feature set.
14. A computer-readable storage medium, characterized in that it comprises a stored program, wherein the program is executable by a terminal device or a computer to perform the method of any one of claims 1 to 13.
15. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 13 by means of the computer program.
CN202110093152.0A 2021-01-25 2021-01-25 Domain name system request identification method, storage medium and electronic device Active CN112422589B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110093152.0A CN112422589B (en) 2021-01-25 2021-01-25 Domain name system request identification method, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110093152.0A CN112422589B (en) 2021-01-25 2021-01-25 Domain name system request identification method, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN112422589A true CN112422589A (en) 2021-02-26
CN112422589B CN112422589B (en) 2021-06-08

Family

ID=74782922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110093152.0A Active CN112422589B (en) 2021-01-25 2021-01-25 Domain name system request identification method, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN112422589B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113438332A (en) * 2021-05-21 2021-09-24 中国科学院信息工程研究所 DoH service identification method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107360159A (en) * 2017-07-11 2017-11-17 中国科学院信息工程研究所 A kind of method and device for identifying abnormal encryption flow
CN109842588A (en) * 2017-11-27 2019-06-04 腾讯科技(深圳)有限公司 Network data detection method and relevant device
CN110247819A (en) * 2019-05-23 2019-09-17 武汉安问科技发展有限责任公司 A kind of Wi-Fi video capture device detection method and system based on encryption stream identification
CN110493208A (en) * 2019-08-09 2019-11-22 南京聚铭网络科技有限公司 A kind of DNS combination HTTPS malice encryption method for recognizing flux of multiple features
CN110557382A (en) * 2019-08-08 2019-12-10 中国科学院信息工程研究所 Malicious domain name detection method and system by utilizing domain name co-occurrence relation
US20200067972A1 (en) * 2016-10-05 2020-02-27 Cisco Technology, Inc. Identifying and using dns contextual flows
CN111224946A (en) * 2019-11-26 2020-06-02 杭州安恒信息技术股份有限公司 TLS encrypted malicious traffic detection method and device based on supervised learning
CN112073551A (en) * 2020-08-26 2020-12-11 重庆理工大学 DGA domain name detection system based on character-level sliding window and depth residual error network
CN112351018A (en) * 2020-10-28 2021-02-09 东巽科技(北京)有限公司 DNS hidden channel detection method, device and equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200067972A1 (en) * 2016-10-05 2020-02-27 Cisco Technology, Inc. Identifying and using dns contextual flows
CN107360159A (en) * 2017-07-11 2017-11-17 中国科学院信息工程研究所 A kind of method and device for identifying abnormal encryption flow
CN109842588A (en) * 2017-11-27 2019-06-04 腾讯科技(深圳)有限公司 Network data detection method and relevant device
CN110247819A (en) * 2019-05-23 2019-09-17 武汉安问科技发展有限责任公司 A kind of Wi-Fi video capture device detection method and system based on encryption stream identification
CN110557382A (en) * 2019-08-08 2019-12-10 中国科学院信息工程研究所 Malicious domain name detection method and system by utilizing domain name co-occurrence relation
CN110493208A (en) * 2019-08-09 2019-11-22 南京聚铭网络科技有限公司 A kind of DNS combination HTTPS malice encryption method for recognizing flux of multiple features
CN111224946A (en) * 2019-11-26 2020-06-02 杭州安恒信息技术股份有限公司 TLS encrypted malicious traffic detection method and device based on supervised learning
CN112073551A (en) * 2020-08-26 2020-12-11 重庆理工大学 DGA domain name detection system based on character-level sliding window and depth residual error network
CN112351018A (en) * 2020-10-28 2021-02-09 东巽科技(北京)有限公司 DNS hidden channel detection method, device and equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113438332A (en) * 2021-05-21 2021-09-24 中国科学院信息工程研究所 DoH service identification method and device

Also Published As

Publication number Publication date
CN112422589B (en) 2021-06-08

Similar Documents

Publication Publication Date Title
Pourvahab et al. An efficient forensics architecture in software-defined networking-IoT using blockchain technology
Kumar et al. Leveraging blockchain for ensuring trust in IoT: A survey
Koroniotis et al. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset
Xu et al. Am I eclipsed? A smart detector of eclipse attacks for Ethereum
US20200412767A1 (en) Hybrid system for the protection and secure data transportation of convergent operational technology and informational technology networks
AU2017395785B2 (en) Voting system and method
Meng et al. Enhancing the security of blockchain-based software defined networking through trust-based traffic fusion and filtration
CN112235266B (en) Data processing method, device, equipment and storage medium
CN102752303B (en) Bypass-based data acquisition method and system
Zeng et al. Flow context and host behavior based shadowsocks’s traffic identification
CN113518042B (en) Data processing method, device, equipment and storage medium
Almubayed et al. A model for detecting tor encrypted traffic using supervised machine learning
Babu et al. Blockchain-based Intrusion Detection System of IoT urban data with device authentication against DDoS attacks
Soleimani et al. Real-time identification of three Tor pluggable transports using machine learning techniques
He et al. Blockchain-based automated and robust cyber security management
CN108712369B (en) Multi-attribute constraint access control decision system and method for industrial control network
Puthal et al. Decision tree based user-centric security solution for critical IoT infrastructure
Masoud et al. On tackling social engineering web phishing attacks utilizing software defined networks (SDN) approach
Gurunathan et al. A review and development methodology of a lightweight security model for IoT-based smart devices
CN112422589B (en) Domain name system request identification method, storage medium and electronic device
Zhang et al. Overview of IoT security architecture
CN116489166A (en) Secure data exchange method and system based on blockchain technology
Ren et al. Enabling secure and versatile packet inspection with probable cause privacy for outsourced middlebox
Rajawat et al. Scheme Invisible Internet Protocol (I2P) using Blockchain techniques for tracking Darkweb User Activities
Misha et al. Zero Knowledge based Authentication for Internet of Medical Things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40038349

Country of ref document: HK