CN113923042A - Malicious software abuse DoH detection and identification system and method - Google Patents

Malicious software abuse DoH detection and identification system and method Download PDF

Info

Publication number
CN113923042A
CN113923042A CN202111245911.7A CN202111245911A CN113923042A CN 113923042 A CN113923042 A CN 113923042A CN 202111245911 A CN202111245911 A CN 202111245911A CN 113923042 A CN113923042 A CN 113923042A
Authority
CN
China
Prior art keywords
cluster
sequence
doh
matrix
final
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111245911.7A
Other languages
Chinese (zh)
Other versions
CN113923042B (en
Inventor
陈伟
张文月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202111245911.7A priority Critical patent/CN113923042B/en
Publication of CN113923042A publication Critical patent/CN113923042A/en
Application granted granted Critical
Publication of CN113923042B publication Critical patent/CN113923042B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Virology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a detection and identification system and a method for malicious software abuse (DoH) in the technical field of deep learning and network security, wherein the detection and identification system comprises the following steps: acquiring a pcap flow packet in a network; after extracting the time sequence characteristics in the pcap flow packet, establishing a packet cluster; generating a cluster sequence based on the clusters of all packets; extracting a final characteristic set in the cluster sequence; inputting the final characteristic set into a Transformer model for calculation to obtain a prediction label; and judging the malicious software abuse DoH flow based on the prediction label type. According to the method, through more relevant time characteristics in the multi-head attention mechanism mining sequence, overall analysis is reduced, so that the accuracy of the model on DoH flow detection under malicious software is improved, and the classification effect of the model is improved.

Description

Malicious software abuse DoH detection and identification system and method
Technical Field
The invention relates to a detection and identification system and method for malicious software abuse (DoH), and belongs to the technical field of deep learning and network security.
Background
The Domain Name System (DNS) is one of the important basic core services in the internet today, and mainly translates domain names easy for human memory into IP addresses easy for machine recognition, and a large number of network services are developed depending on the domain name service. DNS is therefore one of the early vulnerable network protocols, and DNS abuse has been a field of great interest to network security researchers. To overcome some DNS vulnerabilities related to privacy and data manipulation, the internet engineering task force introduced in RFC8484 dnSoverHTTPS (DoH), the communication of hypertext transfer protocol (HTTP) through Secure Socket Layer (SSL) or Transport Layer Security (TLS) transport was largely successful in preventing DNS attacks, and at the same time, DoH improved user privacy and security by preventing eavesdropping and DNS data manipulation. Encryption of traffic effectively provides better privacy, but it also reduces the visibility of network traffic by various security tools, which can affect the security level of the network.
Malware includes computer viruses, worms, trojans, zombie programs, or other programs with malicious intent that are intended to disrupt the operation of a computer system, steal proprietary information, or gain access control rights. When malicious software abuses the DNS protocol, communication between the infected host and the command and control server is typically accomplished using IP-Flux or Domain-Flux technology. In recent years there has been a first known family of malware that uses encryption to hide DNS activity in the DoH tunnel, such as the malware named Godlua, by HTTPS requests to retrieve text records of domain names using DNS, where the URLs of subsequent command and control servers are stored, to which the Godlua malware connects to obtain further instructions, and this technique of retrieving second or third stage command and control server URL addresses from DNS text records is not new. The novelty here is that a DoH request is used instead of a traditional DNS request. In this way, malware hides the frequency of DNS resolution. The reduction in network visibility forces administrators to block the use of DoH encryption in their networks, typically by blocking specific IP addresses of authoritative DoH resolvers. This solution is not perfect, as any malware wants to hide DNS traffic, and can easily create its own DoH resolver on non-standard addresses and ports.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a detection and identification system and method for malicious software abuse DoH, which can achieve the effect of improving the detection accuracy.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, the present invention provides a method for detecting and identifying malware abuse DoH, including:
acquiring a pcap flow packet in a network;
after extracting the time sequence characteristics in the pcap flow packet, establishing a packet cluster;
generating a cluster sequence based on the clusters of all packets;
extracting a final characteristic set in the cluster sequence;
inputting the final characteristic set into a Transformer model for calculation to obtain a prediction label;
and judging the malicious software abuse DoH flow based on the prediction label type.
Further, the cluster of packets is:
C={size,pktCount,direction,duration,interarrivalTime
where C is the cluster of packets, size is the size of the cluster, pktCount is the number of packets in the cluster, direction is the direction of all packets in the cluster, duration is the duration of the cluster, and interrrivaltimei is the inter-arrival time.
Further, the final feature set is:
Fl={(Ci,...,Ci+l)|1≤i<n-l}
S=(C1,...,Cn)
wherein, FlFor the final feature set, S is the cluster sequence, CiIs the ith cluster in the cluster sequence, i is the cluster number, CnAnd representing the nth cluster in the cluster sequence, wherein n is the number of clusters in the cluster sequence, and l is the length of each sequence in the final feature set.
Further, the Transformer model comprises an encoder and a decoder, wherein the encoder extracts a sequence matrix based on the time-series characteristics of the final characteristic set, and the decoder generates a position vector matrix through the extracted sequence matrix.
Further, inputting the final feature set into a Transformer model for calculation to obtain a prediction label, including:
inputting a Transformer model based on the final characteristic set to obtain a sequence matrix, wherein the expression is as follows:
Q={E1,...,Ei-1,Ei,Ei+1,...,El}
wherein Q is a sequence matrix, EiFor cluster vectorization, l is the length of each sequence in the final feature set;
obtaining the position information of the cluster based on the sequence matrix Q, wherein the expression is as follows:
PE(pos,2j)=sin(pos/100002j/d)
PE(pos,2j+1)=cos(pos/100002j/d)
wherein, PE represents the calculated position vector, pos is the position serial number of the cluster in the sequence, j belongs to (0, d) is the serial number of each value in the cluster vector, 2j represents an even position, 2j +1 represents an odd position, and d is the embedding dimension of the cluster;
and adding the sequence matrix Q and the position vector matrix PE to obtain a final coding matrix.
Further, inputting the final feature set into a transform model for calculation to obtain a prediction label, and further comprising:
performing linear mapping on the final coding matrix for multiple times to obtain subsequence codes in different subspaces;
self-attention calculation is carried out on the subsequence codes in each subspace to obtain the subsequence codes A after the dependency weight weightingi
A is to beiAnd performing linear transformation after connection to obtain a characteristic matrix, wherein the expression is as follows:
α=concat(A1,...,Ai-1,Ai,Ai+1,...,Ah)W
wherein α ∈ Rl×lFor the feature matrix, concat is the join function, h is the number of subsequences encoded, W ∈ Rhd×1Is a parameter matrix.
Further, inputting the final feature set into a transform model for calculation to obtain a prediction label, and further comprising:
performing characteristic matrix down-sampling on the global average pooling layer;
inputting the characteristic matrix after down-sampling into a full-connection layer for dimensionality reduction;
inputting a Softmax layer for classification detection based on the feature matrix after dimension reduction to obtain a prediction label;
the predictive tag includes: non-DoH, benign DoH and malicious DoH.
In a second aspect, the present invention provides a detection and identification system for a malware abuse DoH, including:
an acquisition module: the method comprises the steps of obtaining a pcap flow packet in a network;
a cluster creation module: the method comprises the steps of establishing a cluster of packets after extracting time sequence features in a pcap flow packet;
a cluster sequence generation module: for generating a cluster sequence based on the clusters of all packets;
the characteristic set extraction module: extracting a final characteristic set in the cluster sequence;
a predictive tag output module: the system is used for inputting the final characteristic set into a Transformer model for calculation to obtain a prediction label;
a judging module: for malware abuse DoH traffic determination based on predictive tag type.
In a third aspect, a device for detecting and identifying malicious software abuse (DoH) comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any of the above.
In a fourth aspect, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.
Compared with the prior art, the invention has the following beneficial effects:
according to the method, whether the DoH flow in the network environment is malicious DoH abused by malicious software is detected by capturing the pcap flow packet in real time, time sequence features are extracted from the DoH flow packet, then a Transformer self-attention mechanism is adopted, modeling is carried out completely depending on the overall dependency relationship of the attention mechanism on input and output, more relevant time features in the sequence are mined through a multi-head attention mechanism, overall analysis is reduced, and therefore the accuracy of the model on the DoH flow detection under the malicious software is improved, and the classification effect of the model is improved.
Drawings
Fig. 1 is a schematic diagram illustrating a detection and identification process of a malware abuse DoH according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The first embodiment is as follows:
a detection and identification method for malicious software abuse DoH is used for detecting the malicious software abuse DoH based on time series characteristics and self-attention mechanism identification, wherein FIG. 1 is a specific flow of the detection method for identifying the malicious software abuse DoH based on the time series characteristics and the self-attention mechanism and comprises the following steps:
capturing pcap traffic packets in a network, extracting time series characteristics in the data packets, and creating a cluster sequence of packets stored in JSON format to reduce the dimensionality of data, each generated packet cluster having five parameters for representing the characteristics of the cluster, which are the size of the cluster (the sum of data packets in bytes), the number of data packets in the cluster, the direction of all packets in the cluster (incoming or outgoing), the duration of the cluster (the time difference between the first and last cluster), and the inter-arrival time (the time difference between the current and previous cluster), as follows:
1) a packet cluster refers to a sequence of one or more consecutive packets in the same direction (having the same source and destination) in a network flow, creating a cluster of packets, and the basic principle is to combine these packets to find the application traffic between several packets during TLS fragmentation and IP fragmentation. The threshold timeout value t of the cluster is also taken into account so that two packets with a large time difference will not appear in the same cluster.
2) Traffic shape parameters such as packet size, packet direction, time difference between packets are used to infer some information about the underlying traffic. Extracting each generated packet cluster and expressing the packet cluster as follows by using quintuple characteristics:
C={size,pktCount,direction,duration,interarrivalTime}
where C is the cluster of packets, size is the cluster size (sum of packets in bytes), pktCount is the number of packets in the cluster, direction is the direction (incoming or outgoing) of all packets in the cluster, duration is the cluster duration (difference between the first and last cluster time), interrrivaltimei is the inter-arrival time (difference between current and previous cluster time).
Generating a cluster sequence of a packet stored in a JSON format, wherein the size of the sequence depends on the network flow inside the stream, and customizing a sliding window to generate the cluster sequence, so that one stream consists of a plurality of cluster sequences, and the method specifically comprises the following steps:
by generating the clustering process, the clustering sequence of any network flow can be represented as a clustering sequence S:
S=(C1,...,Cn)
Cnrepresenting the nth cluster in a cluster sequence, the size of the sequence n depending on the network traffic inside the stream, using a sliding window of length l to generate the cluster sequence, the cluster sequences smaller than l being filled with empty clusters. If l is a hyper-parameter of the number of clusters in the sequence, the final feature set F extracted from the cluster sequence SlExpressed as:
Fl={(Ci,...,Ci+l)|1≤i<n-l}
Flfor the final feature set, S is the cluster sequence, CiIs the ith cluster in the cluster sequence, i is the cluster number, CnRepresenting the nth cluster in the cluster sequence, wherein n is the number of clusters in the cluster sequence, l is the length of each sequence in the final feature set, and l needs to be customized to find the maximum of lThe optimal value is achieved, and the optimal detection effect is achieved. Finding the best value of/is a trade-off between the accuracy of the detection and the response time.
And thirdly, establishing a Transformer model, wherein the Transformer adopts the structures of an encoder and a decoder, and the two substructures mainly model the extracted time sequence characteristics through a multi-head attention mechanism. The input to the model needs to pass through both substructures simultaneously. The encoder models the timing relationship between clusters in the source sequence, and the decoder generates new information through the information vector extracted by the encoding end. Both the encoder and the decoder adopt a multi-head attention mechanism, a position embedding layer is used for representing time sequence information between sequences, and a multi-head self-attention layer is used for extracting information of clusters in the sequences, wherein the information is as follows:
1) the input layer, which is the input to the model, through the encoder and decoder accepts (l, 5), where 5 is the 5 parameters contained in the cluster. Obtaining a sequence vectorization representation:
Q={E1,...,Ei-1,Ei,Ei+1,...,El}
wherein Q is a sequence matrix, EiFor cluster vectorization, l is the length of each sequence in the final feature set;
2) in both substructures, the input matrix is subjected to a position encoding operation. In the model herein, a structure such as a recurrent neural network is not used, and thus sequence information cannot be directly captured. But the sequence information is very important and represents a global structure, so the relative or absolute position information of clusters in the sequence must be utilized. The calculation formula of the position information is as follows:
PE(pos,2j)=sin(pos/100002j/d)
PE(pos,2j+1)=cos(pos/100002j/d)
wherein PE represents the calculated position vector, pos is the position sequence number of the cluster in the sequence, and j belongs to (0, d) as the cluster vector CiThe serial number of each value in (1) is coded by sine at even position 2j, coded by cosine at odd position 2j +1, and d is the embedding dimension of the cluster.
3) The dimensionality of the sequence matrix Q is the same as that of the position vector matrix PE, and the two matrixes are added to obtain a final coding matrix.
4) In the multi-head attention calculation, each 1 head is 1 linear mapping. And performing linear mapping on the final coding matrix for multiple times to obtain subsequence codes in different subspaces. Self-attention calculation is carried out on the subsequence coding in each subspace, and the subsequence coding is coded as A after the dependency weight weightingi. For the extracted AiAnd (3) connecting, and obtaining a characteristic matrix alpha after linear transformation:
α=concat(A1,...,Ai-1,Ai,Ai+1,...,Ah)W
wherein α ∈ Rl×lFor the feature matrix, concat is the join function, h is the number of subsequences encoded, W ∈ Rhd×1Is a parameter matrix.
And fourthly, the second layer of the detection model is a global averaging pooling layer, after the feature matrix alpha is obtained, the feature matrix alpha is slid on the feature map in a window mode (window sliding similar to convolution), the average value in the window is taken as a result, one tensor of alpha-W-H-D is changed into a tensor of g-1-D, and the feature matrix is subjected to characteristic matrix down-sampling in the global averaging pooling layer, so that the overfitting phenomenon is reduced. Wherein, α is the original feature map, D is the number of sequence files, the number of feature maps is equal to the number of sequence files, and the average value of each feature map is calculated by the following calculation formula:
gi=avg(αi)
wherein g isiIs the result of averaging each feature map.
Fifthly, mixing giAfter the dimension reduction of the full-connection layer is input, a Softmax layer is input for classification detection, and a prediction label (whether malicious DoH exists) is obtained by cluster sequence classification:
Figure BDA0003320846400000091
wherein, the final output dimension of the Dense layer (Dense layer) is 3, which represents three categories: non-DoH, benign DoH and malignantDoH is intended.
Figure BDA0003320846400000092
And taking the maximum probability value of each class probability as a classification result, namely, the class to which the probability value belongs, so that the malicious DoH can be detected.
Example two:
a detection and identification system for malware abuse DoH, comprising:
an acquisition module: the method comprises the steps of obtaining a pcap flow packet in a network;
a cluster creation module: the method comprises the steps of establishing a cluster of packets after extracting time sequence features in a pcap flow packet;
a cluster sequence generation module: for generating a cluster sequence based on the clusters of all packets;
the characteristic set extraction module: extracting a final characteristic set in the cluster sequence;
a predictive tag output module: the system is used for inputting the final characteristic set into a Transformer model for calculation to obtain a prediction label;
a judging module: for malware abuse DoH traffic determination based on predictive tag type.
Example three:
the embodiment of the invention also provides a device for detecting and identifying the malicious software abuse DoH, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of:
acquiring a pcap flow packet in a network;
after extracting the time sequence characteristics in the pcap flow packet, establishing a packet cluster;
generating a cluster sequence based on the clusters of all packets;
extracting a final characteristic set in the cluster sequence;
inputting the final characteristic set into a Transformer model for calculation to obtain a prediction label;
and judging the malicious software abuse DoH flow based on the prediction label type.
Example four:
an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following method steps:
acquiring a pcap flow packet in a network;
after extracting the time sequence characteristics in the pcap flow packet, establishing a packet cluster;
generating a cluster sequence based on the clusters of all packets;
extracting a final characteristic set in the cluster sequence;
inputting the final characteristic set into a Transformer model for calculation to obtain a prediction label;
and judging the malicious software abuse DoH flow based on the prediction label type.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A detection and identification method for malicious software abuse (DoH) is characterized by comprising the following steps:
acquiring a pcap flow packet in a network;
after extracting the time sequence characteristics in the pcap flow packet, establishing a packet cluster;
generating a cluster sequence based on the clusters of all packets;
extracting a final characteristic set in the cluster sequence;
inputting the final characteristic set into a Transformer model for calculation to obtain a prediction label;
and judging the malicious software abuse DoH flow based on the prediction label type.
2. The method for detecting and identifying DoH of malicious software according to claim 1,
the cluster of packets is:
C={size,pktCount,direction,duration,interarrivalTime}
where C is the cluster of packets, size is the size of the cluster, pktCount is the number of packets in the cluster, direction is the direction of all packets in the cluster, duration is the duration of the cluster, and interrrivaltimei is the inter-arrival time.
3. The method for detecting and identifying DoH of malicious software according to claim 2,
the final feature set is:
Fl={(Ci,...,Ci+l)|1≤i<n-l}
S=(C1,...,Cn)
wherein, FlFor the final feature set, S is the cluster sequence, CiIs the ith cluster in the cluster sequence, i is the cluster number, CnAnd representing the nth cluster in the cluster sequence, wherein n is the number of clusters in the cluster sequence, and l is the length of each sequence in the final feature set.
4. The method for detecting and identifying malicious software abuse (DoH) according to claim 1, wherein the Transformer model comprises an encoder and a decoder, the encoder extracts a sequence matrix based on the time-series characteristics of the final characteristic set, and the decoder generates a position vector matrix through the extracted sequence matrix.
5. The method for detecting and identifying DoH of malicious software according to claim 1,
inputting the final characteristic set into a Transformer model for calculation to obtain a prediction label, wherein the prediction label comprises the following steps:
inputting a Transformer model based on the final characteristic set to obtain a sequence matrix, wherein the expression is as follows:
Q={E1,...,Ei-1,Ei,Ei+1,...,El}
wherein the content of the first and second substances,q is a sequence matrix, EiFor cluster vectorization, l is the length of each sequence in the final feature set;
obtaining the position information of the cluster based on the sequence matrix Q, wherein the expression is as follows:
PE(pos,2j)=sin(pos/100002j/d)
PE(pos,2j+1)=cos(pos/100002j/d)
wherein, PE represents the calculated position vector, pos is the position serial number of the cluster in the sequence, j belongs to (0, d) is the serial number of each value in the cluster vector, 2j represents an even position, 2j +1 represents an odd position, and d is the embedding dimension of the cluster;
and adding the sequence matrix Q and the position vector matrix PE to obtain a final coding matrix.
6. The method for detecting and identifying DoH of malicious software according to claim 5,
inputting the final feature set into a Transformer model for calculation to obtain a prediction label, and further comprising:
performing linear mapping on the final coding matrix for multiple times to obtain subsequence codes in different subspaces;
self-attention calculation is carried out on the subsequence codes in each subspace to obtain the subsequence codes A after the dependency weight weightingi
A is to beiAnd performing linear transformation after connection to obtain a characteristic matrix, wherein the expression is as follows:
α=concat(A1,...,Ai-1,Ai,Ai+1,...,Ah)W
wherein α ∈ Rl×lFor the feature matrix, concat is the join function, h is the number of subsequences encoded, W ∈ Rhd×1Is a parameter matrix.
7. The method for detecting and identifying DoH of malicious software according to claim 6,
inputting the final feature set into a Transformer model for calculation to obtain a prediction label, and further comprising:
performing characteristic matrix down-sampling on the global average pooling layer;
inputting the characteristic matrix after down-sampling into a full-connection layer for dimensionality reduction;
inputting a Softmax layer for classification detection based on the feature matrix after dimension reduction to obtain a prediction label;
the predictive tag includes: non-DoH, benign DoH and malicious DoH.
8. A detection and identification system for DoH (malware abuse over) comprising:
an acquisition module: the method comprises the steps of obtaining a pcap flow packet in a network;
a cluster creation module: the method comprises the steps of establishing a cluster of packets after extracting time sequence features in a pcap flow packet;
a cluster sequence generation module: for generating a cluster sequence based on the clusters of all packets;
the characteristic set extraction module: extracting a final characteristic set in the cluster sequence;
a predictive tag output module: the system is used for inputting the final characteristic set into a Transformer model for calculation to obtain a prediction label;
a judging module: for malware abuse DoH traffic determination based on predictive tag type.
9. The device for detecting and identifying the abuse of DoH of the malicious software is characterized by comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 1 to 7.
10. Computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202111245911.7A 2021-10-26 2021-10-26 Detection and identification system and method for malicious software abuse (DoH) Active CN113923042B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111245911.7A CN113923042B (en) 2021-10-26 2021-10-26 Detection and identification system and method for malicious software abuse (DoH)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111245911.7A CN113923042B (en) 2021-10-26 2021-10-26 Detection and identification system and method for malicious software abuse (DoH)

Publications (2)

Publication Number Publication Date
CN113923042A true CN113923042A (en) 2022-01-11
CN113923042B CN113923042B (en) 2023-09-15

Family

ID=79243014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111245911.7A Active CN113923042B (en) 2021-10-26 2021-10-26 Detection and identification system and method for malicious software abuse (DoH)

Country Status (1)

Country Link
CN (1) CN113923042B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513341A (en) * 2022-01-21 2022-05-17 上海斗象信息科技有限公司 Malicious traffic detection method, device, terminal and computer readable storage medium
CN114900360A (en) * 2022-05-12 2022-08-12 国家计算机网络与信息安全管理中心山西分中心 Method for detecting DoH flow in HTTPS flow

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170134404A1 (en) * 2015-11-06 2017-05-11 Cisco Technology, Inc. Hierarchical feature extraction for malware classification in network traffic
US9705904B1 (en) * 2016-07-21 2017-07-11 Cylance Inc. Neural attention mechanisms for malware analysis
CN108667816A (en) * 2018-04-19 2018-10-16 重庆邮电大学 A kind of the detection localization method and system of Network Abnormal
CN110276439A (en) * 2019-05-08 2019-09-24 平安科技(深圳)有限公司 Time Series Forecasting Methods, device and storage medium based on attention mechanism
CN111669385A (en) * 2020-05-29 2020-09-15 重庆理工大学 Malicious traffic monitoring system fusing deep neural network and hierarchical attention mechanism
CN113316163A (en) * 2021-06-18 2021-08-27 东南大学 Long-term network traffic prediction method based on deep learning
CN113472809A (en) * 2021-07-19 2021-10-01 华中科技大学 Encrypted malicious traffic detection method and system and computer equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170134404A1 (en) * 2015-11-06 2017-05-11 Cisco Technology, Inc. Hierarchical feature extraction for malware classification in network traffic
US9705904B1 (en) * 2016-07-21 2017-07-11 Cylance Inc. Neural attention mechanisms for malware analysis
CN108667816A (en) * 2018-04-19 2018-10-16 重庆邮电大学 A kind of the detection localization method and system of Network Abnormal
CN110276439A (en) * 2019-05-08 2019-09-24 平安科技(深圳)有限公司 Time Series Forecasting Methods, device and storage medium based on attention mechanism
CN111669385A (en) * 2020-05-29 2020-09-15 重庆理工大学 Malicious traffic monitoring system fusing deep neural network and hierarchical attention mechanism
CN113316163A (en) * 2021-06-18 2021-08-27 东南大学 Long-term network traffic prediction method based on deep learning
CN113472809A (en) * 2021-07-19 2021-10-01 华中科技大学 Encrypted malicious traffic detection method and system and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈伟;胡磊;杨龙;: "基于载荷特征的加密流量快速识别方法", 计算机工程, no. 12 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513341A (en) * 2022-01-21 2022-05-17 上海斗象信息科技有限公司 Malicious traffic detection method, device, terminal and computer readable storage medium
CN114513341B (en) * 2022-01-21 2023-09-12 上海斗象信息科技有限公司 Malicious traffic detection method, malicious traffic detection device, terminal and computer readable storage medium
CN114900360A (en) * 2022-05-12 2022-08-12 国家计算机网络与信息安全管理中心山西分中心 Method for detecting DoH flow in HTTPS flow
CN114900360B (en) * 2022-05-12 2023-09-22 国家计算机网络与信息安全管理中心山西分中心 Method for detecting DoH flow in HTTPS flow

Also Published As

Publication number Publication date
CN113923042B (en) 2023-09-15

Similar Documents

Publication Publication Date Title
Abdullahi et al. Fractal coding-based robust and alignment-free fingerprint image hashing
CN113923042B (en) Detection and identification system and method for malicious software abuse (DoH)
Guo et al. Blind image watermarking method based on linear canonical wavelet transform and QR decomposition
Sun et al. Secure and robust image hashing via compressive sensing
Sharma et al. An enhanced Huffman-PSO based image optimization algorithm for image steganography
Hosny et al. Robust image hashing using exact Gaussian–Hermite moments
Hamghalam et al. Geometric modelling of the wavelet coefficients for image watermarking using optimum detector
Ma et al. Secure multimodal biometric authentication with wavelet quantization based fingerprint watermarking
Rani et al. A robust watermarking scheme exploiting balanced neural tree for rightful ownership protection
Pérez et al. Universal steganography detector based on an artificial immune system for JPEG images
Xue et al. A multi-layer steganographic method based on audio time domain segmented and network steganography
Farhat et al. Towards blind detection of low‐rate spatial embedding in image steganalysis
Chen et al. Using adversarial examples to bypass deep learning based url detection system
Xu et al. A deep learning framework supporting model ownership protection and traitor tracing
CN115622793A (en) Attack type identification method and device, electronic equipment and storage medium
CN111159588B (en) Malicious URL detection method based on URL imaging technology
CN114362988A (en) Network traffic identification method and device
CN115134095A (en) Botnet control terminal detection method and device, storage medium and electronic equipment
CN112613055A (en) Image processing system and method based on distributed cloud server and digital-image conversion
Dutta et al. A secure algorithm for biometric-based digital image watermarking in DCT domain
Kamal et al. Review of Different Steganographic techniques on Medical images regarding their efficiency
CN114124563B (en) Abnormal flow detection method and device, electronic equipment and storage medium
Michaylov Exploring the Use of Steganography and Steganalysis in Forensic Investigations for Analysing Digital Evidence
Fan et al. A fingerprint-based audio authentication scheme using frequency domain statistical characteristic
Fan et al. Audio and video matching zero-watermarking algorithm based on NSCT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant