CN116208356B - Virtual currency mining flow detection method based on deep learning - Google Patents

Virtual currency mining flow detection method based on deep learning Download PDF

Info

Publication number
CN116208356B
CN116208356B CN202211325209.6A CN202211325209A CN116208356B CN 116208356 B CN116208356 B CN 116208356B CN 202211325209 A CN202211325209 A CN 202211325209A CN 116208356 B CN116208356 B CN 116208356B
Authority
CN
China
Prior art keywords
detection
flow
mining
data
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211325209.6A
Other languages
Chinese (zh)
Other versions
CN116208356A (en
Inventor
付添翼
席少珂
卜凯
任奎
张帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202211325209.6A priority Critical patent/CN116208356B/en
Publication of CN116208356A publication Critical patent/CN116208356A/en
Application granted granted Critical
Publication of CN116208356B publication Critical patent/CN116208356B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/56Financial cryptography, e.g. electronic payment or e-cash
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/76Proxy, i.e. using intermediary entity to perform cryptographic operations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a virtual currency mining flow detection method based on deep learning, which comprises the following steps: (1) Capturing mining flow and normal flow in advance, wherein each captured data flow comprises a plurality of data packets, extracting relevant information of each data packet and storing the relevant information; (2) Constructing a detection model based on a neural network, processing the data flow connected with each network into a plurality of detection inputs by utilizing the packet length, the time stamp and the target address information of each data packet, and then training the detection model by utilizing the detection inputs; the structure of the detection model comprises two convolution layers, two pooling layers and three full-connection layers; (3) And (3) constructing a real-time detection system, detecting the real-time data flow in the real-time detection system by using a trained detection model, and judging whether the real-time data flow is the mining flow. The invention has the advantages of high detection accuracy, strong detection instantaneity, convenient deployment and transplantation, suitability for encrypted network environments and the like.

Description

Virtual currency mining flow detection method based on deep learning
Technical Field
The invention relates to the field of blockchain and network security, in particular to a virtual currency mining flow detection method based on deep learning.
Background
Virtual currency refers to digital currency generated by blockchain, represented by bitcoin, ethernet, door coin, etc., which is not controlled by government agencies. Blockchains are a kind of decentralised system that operates independently of one or some specific network nodes, but rather a mechanism is designed to rely on most nodes in the network to "vote" for decision results and broadcast the results and information to the whole chain, thus achieving decentralisation. However, with the increasing market for various virtual currencies, there is an accompanying increase in the phenomenon of mining (obtaining revenue by mining virtual currencies). The method also brings security problems, and lawbreakers in the network use mining attack technology to exploit virtual currency by using equipment of other people in order to save resources, so that benefits of other people are seriously infringed.
The hazard of mining attacks (Cryptojacking) is severe because mining makes use of the Central Processing Unit (CPU) and Graphics Processing Unit (GPU) of a computer, allowing them to run under extremely high loads, which can cause significant performance loss to the victim's equipment. In addition, the mine mining attacker may perform the following operations on the victim's host by means of trojans: uninstalling security protection software, adding startup items, adding administrators, and closing the protection wall can seriously jeopardize the security of the victim host. In addition, the mining activities bring a large amount of electricity expenditure, and investigation shows that the electricity expenditure accounts for more than 90% of the total cost in the virtual currency mining cost. Therefore, effective detection of mining activities is necessary.
The current mining attacks are mainly divided into two types, the first is that an attacker invades a popular network server and embeds malicious mining codes into a website, and when a user browses the website, the user passively performs virtual currency mining (short for browser mining behavior); another attack refers to that an attacker controls a computer of a user through malicious software, and directly uses a host of the user to perform mining (abbreviated as host mining behavior).
However, in the existing literature, no practical mining detection method has been proposed, and most of the existing methods have obvious defects: poor real-time performance or large deployment difficulty. These methods can be largely divided into three categories: the first is directed to detection of mine mining scripts such as those proposed by Geng Hong et al (How you get shot in the back: A systematical study about cryptojacking in the real world, 2018) and Konoth et al (Minesweeper: an in-depth look into drive-by cryptocurrency mining and its defense, 2018). The second category is directed to detection of mining software, such as the detection methods proposed by Soviany et al (Android malware detection and crypto-mining recognition methodology with machine learning, 2018) and Gangwal et al (Cryptomining cannot change its spots: detecting covert cryptomining using magnetic side-channel, 2019). The third category is directed to detection of mined material flow analysis, such as the detection methods proposed by Shize Zhang et al (MineHunter: A Practical Cryptomining Traffic Detection Algorithm Based on Time Series Tracking, 2021) and Caproliu et al (Cryptomining makes noise: a machine learning approach for cryptojacking detection, 2019).
The first method is aimed at detecting the mining script, and the detection method is oriented to the mining behavior of a browser and is used for detecting the mining script according to the characteristic that the mining script usually involves a large number of hash calculations. The most time-efficient method comprises the steps of designing a group of analyzers based on running time behaviors by utilizing certain inherent characteristics of a virtual currency mining script, considering that the core function of the mining work is a workload proving system, most of workload is usually calculation of hash values, and the time spent by common webpages on hash functions is less, so that analysis can be carried out by calculating accumulated time spent by webpages on common accessible hash library interfaces, if the spent time spent by a certain webpage on the hash value calculation exceeds 10% of the total time, the analyzers can doubt that the mining script is executed, and besides, certain regularity exists in stack depth and calling chains in the execution process of the mining script, and the same stack is rarely repeatedly called by normal webpages, which is one of analysis basis; by analyzing JavaScript codes from common network mining tools (such as NFWebMiner, coinhive, etc.) and the correlation characteristics of functions containing various encryption operations (exclusive or, shift, rotation) in the wasm module, a set of detection strategies is designed, each function is matched with fingerprints of five encryption primitives (Keccak, AES, BLAKE-256, groestl-256, skein-256) which are necessary for computing hash values by the mining algorithm in the byte code of the wasm module used by the webpage to be tested, and if enough encryption primitives are completely matched, the webpage is considered to contain the mining script. In addition, they would suspect that the web page contains mine-mining scripts if the value exceeds a certain threshold, based on the number of encryption operations in each function loop in the wasm module used for the web page. In the mode of identifying the mining behavior according to certain characteristics of the mining script, the plaintext content of the whole webpage is required to be obtained, and the efficiency of the works can be obviously affected if a strategy of load confusion is adopted in the network transmission process.
The second method is aimed at the detection of the mining software, and the method is oriented to the mining behavior of the host, and the mining software on the host is used as malicious software for detection and monitoring. The common practice includes that information of related devices of various types of software and functions or operations of an operating system thereof is used as original features, the information relates to rights, mobile application program settings, device attributes, protocol related information and operating system related attributes, certain derivative features are extracted according to statistical information of related devices and malicious software events occurring on the operating system, feature fusion and feature extraction are carried out by utilizing the original features and the derivative features, and finally training classification is carried out on the finally generated features by using a Support Vector Machine (SVM), so that identification of malicious mining software is realized; the mining behavior is identified by utilizing the magnetic side channel, the theoretical basis is that when a CPU executes the mining operation, the current load is too high, the surrounding magnetic field strength can be changed more severely, the magnetic field strength sequence around the CPU when executing different operations is measured and recorded by using a 10HZ probe magnetic sensor in a time period (100 times of sampling), and training and learning are carried out on the magnetic field strength sequence by utilizing a K-nearest neighbor algorithm, so that the detection of the mining behavior is finally realized. Problems with such methods include a small detection range, inability to identify unknown software and physical proximity of the inspector, magnetic sensor, and device under test, and difficulty in deployment on large enterprises.
The third method is aimed at the detection of the mining flow analysis, and the method is simultaneously oriented to the mining of a browser and the mining of a host, and the network transmission characteristics in the mining process are utilized for detecting the mining. Recently, with the enhancement of network defense strength for mining behaviors, such as cutting off network transmission of a victim host and a mining pool by operators through means of mining pool IP blocking, domain name pollution and the like, network activities of novel mining attacks are more hidden. For example, a trojan horse can use a proxy tool (such as VPN) to encrypt communication content and simultaneously confuse the characteristics of packet length, packet number, packet interval and the like in the flow transmission process; the proxy host is connected with the mine pool, so that the network detection means based on the IP address and the data packet content can be easily bypassed. The detection method with the most timeliness aiming at the novel mining attack comprises the steps of designing a set of identification strategy based on time sequence tracking by utilizing the correlation between the blockchain output blocks and the mining flow packets: collecting flow at the gateway entrance, distinguishing the flows according to the two-tuple of the ip source and the destination, and recording the time stamp of each data packet for each flow; calculating the local correlation between the time stamp sequence recorded for each flow and the block-out time sequence of the virtual currency in the corresponding time period in each local specific time period, and finally evaluating the possibility that the flow is the mining flow according to the global correlation of each flow; packet time intervals and data packet sizes and their derived characteristics are used as features for training random forests, while the evaluation is performed using a k-fold cross validation method. However, these methods have problems of poor encryption traffic recognition effect on unknown (untrained) agents, need for manual design and screening of traffic characteristics, high requirement on balance of training sets, long detection and confirmation time window (need to wait for multiple blocks to generate), and the like.
Disclosure of Invention
The invention provides a method for detecting the flow of virtual currency mining on the basis of deep learning, which has the advantages of high detection accuracy, strong detection instantaneity, convenience in deployment and transplantation, suitability for data sets with unbalanced scales, suitability for encrypted network environments and the like.
A virtual currency mining flow detection method based on deep learning comprises the following steps:
(1) Each data flow grabbed in advance comprises a plurality of data packets, relevant information of each data packet is extracted and stored, and a tuple sequence with the format of < timestamp, packet length, source address ip, source address port number, destination address ip and destination address port number > is stored;
(2) Constructing a detection model based on a neural network, processing the data flow connected with each network into a plurality of detection inputs by utilizing the packet length, the time stamp and the target address information of each data packet, and then training the detection model by utilizing the detection inputs;
the structure of the detection model comprises two convolution layers, two pooling layers and three full-connection layers;
(3) And (3) building a real-time detection system, and in the real-time detection system, detecting the real-time data flow by using a trained detection model to judge whether the real-time data flow is the mining flow.
In the step (1), the mining flow comes from virtual currency, and the tool Wireshark is used for capturing the data flow of network connection in each mining process, wherein each network connection lasts for 1 hour; normal traffic comes from daily network usage, with data sizes 8-15 times greater than mine-mining traffic.
In step (2), the format of the detected input is as follows:
[T in ,T out ,S in ,S out ]
wherein, T represents the time sequence difference between the current packet and the previous data packet in the same direction, S represents the packet length of the data packet; in and out represent incoming and outgoing traffic, respectively, and are determined based on the source address and destination address of each packet.
In the training process of the detection model, for one data stream, each group of detection inputs sequentially takes N data packets on each feature of each direction, so that each detection input complies with a 4×N two-dimensional matrix format, and the insufficient number of features are filled with 0; each feature of the next set of detection inputs starts from the next position adjacent to the last data currently input each feature until any feature is consumed by the detection model.
In the step (2), the structure of the detection model specifically comprises a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a first full-connection layer, a second full-connection layer and a third full-connection layer which are sequentially connected;
wherein the number of convolution kernels of the first convolution layer is 20, the convolution kernel size is 2×20, and the step size is 2×1; the number of convolution kernels of the second convolution layer is 100, the convolution kernel size is 2×20, and the step size is 2×1; the window sizes of the first pooling layer and the second pooling layer are 1 multiplied by 5, and the step length is 1 multiplied by 1; the number of hidden layers of the first full-connection layer is 1200, the number of hidden layers of the second full-connection layer is 500, and the number of hidden layers of the third full-connection layer is 100.
The detection process of the detection model comprises the following steps: the detection input firstly enters a convolution layer, the convolution kernel and each input area are subjected to convolution operation, so that characteristics are extracted from the input, the characteristic values are input into an activation function, and the output obtained from the activation function enters a pooling layer; the pooling layer is used for reducing the scale of the feature matrix, so that the number of parameters is reduced to reduce the calculated amount of the training process;
after all the convolution layers and the pooling layers, obtaining advanced derivative characteristics of each group of detection inputs; these advanced derivative features are then passed to the fully connected layer, using these features to classify the input, while combining dropout to prevent overfitting;
and finally, the obtained network output is used for representing the correlation coefficient between the correlation network connection and the mining flow, the larger the numerical value is, the higher the probability that the data flow is the mining flow is, and when the network output is larger than the detection threshold value, the group of input detection results are considered to belong to the mining flow.
In the training process of the detection model, for each input sample in the training set, if the input sample belongs to the mining flow, a label with the value of 1 is used for marking, otherwise, if the input sample belongs to the flow of normal behavior, the value of the label is 0;
then estimating a loss value using a classification cross entropy function, wherein before calculating the loss, a sigmoid function is required to map the output obtained by each input in the detection model to a section of (0, 1); the training process of loss function minimization uses Adam optimizers to optimize network node values.
In the step (3), a real-time detection system is built by using DPDK-17.05.2, wherein two processes are used for respectively acquiring flow data and detecting flow;
in the detection process, relevant information of each network connection needs to be saved, including: the total number of data packets currently transmitted over the network connection, the packet length of each data packet, and the time stamp; the acquisition process judges the corresponding network connection according to the field information of the data packet received from the network port and updates the related information of the corresponding connection; when the number of packets of a certain network connection reaches a set scale, processing the relevant characteristics of the corresponding number of data packets belonging to the connection stored at present into a group of detection inputs and putting the detection inputs into a cache pool; the detection process then consumes the sets of detection inputs in the cache pool continuously, and uses the detection model to detect them.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention utilizes the deep neural network to learn the communication interaction characteristics of the original encrypted mining flow, and has a general detection effect on the encrypted currency mining flow adopting a PoW consensus mechanism; compared with the traditional supervised machine learning algorithm, the method can save labor and time cost for designing and screening the effective flow characteristics.
2. The method has a better identification effect on an unknown agent tool (flow confusion method), and is suitable for a data set with unbalanced scale.
3. The neural network design adopted by the invention is friendly to the implementation of a mainstream open source software and hardware framework, and can support the real-time flow detection of a 100G network port.
Drawings
FIG. 1 is a diagram of a network architecture of a detection model in the present invention;
FIG. 2 is a block diagram of a real-time detection system according to the present invention.
Detailed Description
The invention will be described in further detail with reference to the drawings and examples, it being noted that the examples described below are intended to facilitate the understanding of the invention and are not intended to limit the invention in any way.
In the invention, according to 10: the scale of 1 collects normal and mine-dig flows (using the Wireshark tool). In the captured data, each data stream (pcap file) contains a plurality of data packets, relevant information of each data packet is extracted and stored, and the stored format is a tuple sequence of < timestamp, packet length, source address ip, source address port number, destination address ip, destination address port number >.
Each network connection pcap file is represented as a number of network inputs using information of packet length, time stamp, destination address, etc. of each packet, which are represented as follows:
[T in ,T out ,S in ,S out ]
here, T represents the timing difference between the current packet and the previous packet in the same direction, S represents the packet length, and in and out represent the incoming and outgoing traffic, respectively (determined from the source address and destination address of each packet).
Since CNN network inputs require a fixed length, for a stream, each group of inputs takes packets in order N on each feature in each direction, so that each input follows a 4 x N two-dimensional matrix format, an insufficient number of features will be filled with 0, each feature of the next group of inputs starting from the next position adjacent to the last data of each feature currently input until any one feature is consumed.
And constructing a detection model based on the convolutional neural network, and identifying network input and outputting an identification result. The detection model network structure comprises two convolution layers, two pooling layers and three full connection layers, wherein the related operations comprise: feature extraction, full connection and overfitting prevention, the specific structure is shown in figure 1.
In the feature extraction process, the input first enters the convolution layer, the convolution kernel performs a convolution operation with each region of the input, so that features are extracted from the input, more convolution kernels mean that more features can be extracted, and these values are input to the activation function (the activation function we have selected ReLU). The output from the activation function enters the pooling layer, which acts to reduce the size of the feature matrix and thus the number of parameters to reduce the computational effort of the training process. Here we use Max Pooling, which preserves the maximum value in a specific region of the feature matrix. Here we use n in the first convolution layer 1 A plurality of convolution kernels, each convolution kernel having a size of 2 xw 1 Step size of 2×s 1 It is intended to discover the links between the same features in different directions. The second convolution layer is applied to n 2 A convolution kernel of size 2 xw 2 Step size of 2×s 2
After all the convolution and pooling layers, advanced derivative features for each set of inputs are already available. These features are then passed to the fully connected layer, which functions to classify the input using these features, in addition to the problem of preventing overfitting in combination with dropout.
In summary, the resulting output in the network for any set of inputs f can be expressed as:
the network output is used for representing the correlation coefficient between the network flow where f is located and the mining flow, and the larger the value of the correlation coefficient is, the higher the probability that the network flow corresponding to f belongs to the mining flow is. Here we set a detection threshold η, when the network input is greater than the detection threshold we consider the set of inputs to belong to the mine excavation flow.
In the training process of the network, for each input sample in the training set, if the input sample belongs to the mining flow, a label with a value of 1 is used for identification, otherwise, if the input sample belongs to the normal behavior flow, the value of the label is 0. To estimate the loss value, using a class-cross entropy function, we need to map the resulting output in the network for each input to the interval of (0, 1) using a sigmoid function before calculating the loss. The training process of minimizing the loss function selects an Adam optimizer to optimize the network node values, and parameters of each layer of the network structure are shown in table 1.
TABLE 1
The invention is operated on a server (CPU: 2.8GHz Intel Core i5-8400, memory: 128 GB), a real-time detection system of network traffic is built by using DPDK-17.05.2, and two processes are used for respectively acquiring traffic data and detecting traffic. First, the relevant information of each network connection needs to be saved, which includes: the total number of data packets currently transmitted over the connection, the packet length of each data packet and the time stamp; the acquisition process judges the corresponding network connection according to the field information of the data packet received from the network port, and updates the related information of the corresponding connection, when the number of the packets of a certain connection reaches a certain scale, the related characteristics of the data packet which belongs to the corresponding number of the connection and is stored at present are processed into a group of detection inputs, and the detection inputs are put into a cache pool; the detection process then consumes the detection inputs in the cache pool continuously, and uses the detection model to detect them. The framework of the real-time detection system is shown in fig. 2.
Since there is no known mining flow data set, the present invention uses a hybrid data set that is self-structured to conduct experiments, including mining flow as well as normal behavioral flow.
The mining flow constructed by the invention mainly comes from an Ethernet, and the tool Wireshark is used for grabbing a flow packet in each mining process, and each connection lasts for 1 hour. The data construction needs to fully consider the possible impact of various agents and other factors on the flow characteristics of the mining behavior, and in addition, the mining pool that is computationally intensive and supports TLS communication needs to be selected as much as possible. The ore model number comprises RTX2060 and RTX3090 x 4, an NBminer ore digging tool is mainly used, a data set covers 42 ore pools such as ethermine, flexpool, f pool and the like, an ore digging algorithm is ethane, an ore digging protocol mainly comprises Stratum and Ethproxy, an ore pool connection protocol comprises TCP and SSL, and various agent tools such as OpenVPN, V2Ray, SSR, trojan and the like are involved. Currently, a total of about 300 mine drainage streams are collected, each level comprising about 3 ten thousand packet messages. The flow of normal behavior mainly comes from daily network use such as Zoom, youtube, webpage and the like, all come from own experimental machines, the total scale is about 10 times of the mining flow set, and 10 ten thousand groups of data are obtained on the final data set.
The final experimental result shows that the detection accuracy rate of the invention reaches 99.9%, the recall rate reaches 99.4%, and the detection speed reaches 8.3Mpps. The experimental data prove that the invention is not only feasible, but also has high efficiency and real-time performance, and solves the practical problem.
The foregoing embodiments have described in detail the technical solution and the advantages of the present invention, it should be understood that the foregoing embodiments are merely illustrative of the present invention and are not intended to limit the invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the invention.

Claims (5)

1. The method for detecting the virtual currency mining flow based on deep learning is characterized by comprising the following steps of:
(1) Each data flow grabbed in advance comprises a plurality of data packets, relevant information of each data packet is extracted and stored, and a tuple sequence with the format of < timestamp, packet length, source address ip, source address port number, destination address ip and destination address port number > is stored;
(2) Constructing a detection model based on a neural network, processing the data flow connected with each network into a plurality of detection inputs by utilizing the packet length, the time stamp and the target address information of each data packet, and then training the detection model by utilizing the detection inputs;
the structure of the detection model comprises two convolution layers, two pooling layers and three full-connection layers; the device specifically comprises a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a first full-connection layer, a second full-connection layer and a third full-connection layer which are sequentially connected;
wherein the number of convolution kernels of the first convolution layer is 20, the convolution kernel size is 2×20, and the step size is 2×1; the number of convolution kernels of the second convolution layer is 100, the convolution kernel size is 2×20, and the step size is 2×1; the window sizes of the first pooling layer and the second pooling layer are 1 multiplied by 5, and the step length is 1 multiplied by 1; the number of hidden layers of the first full-connection layer is 1200, the number of hidden layers of the second full-connection layer is 500, and the number of hidden layers of the third full-connection layer is 100;
the detection process of the detection model comprises the following steps: the detection input firstly enters a convolution layer, the convolution kernel and each input area are subjected to convolution operation, so that characteristics are extracted from the input, the characteristic values are input into an activation function, and the output obtained from the activation function enters a pooling layer; the pooling layer is used for reducing the scale of the feature matrix, so that the number of parameters is reduced to reduce the calculated amount of the training process;
after all the convolution layers and the pooling layers, obtaining advanced derivative characteristics of each group of detection inputs; these advanced derivative features are then passed to the fully connected layer, using these features to classify the input, while combining dropout to prevent overfitting;
the network output obtained finally is used for representing the correlation coefficient between the correlation network connection and the mining flow, the larger the numerical value is, the higher the probability that the data flow is the mining flow is, and when the network output is larger than the detection threshold value, the group of input detection results are considered to belong to the mining flow;
(3) Building a real-time detection system, wherein in the real-time detection system, a trained detection model is utilized to detect real-time data flow, and whether the real-time data flow is the mining flow is judged;
specifically, a real-time detection system is built by using DPDK-17.05.2, wherein two processes are used for respectively acquiring flow data and detecting flow;
in the detection process, relevant information of each network connection needs to be saved, including: the total number of data packets currently transmitted over the network connection, the packet length of each data packet, and the time stamp; the acquisition process judges the corresponding network connection according to the field information of the data packet received from the network port and updates the related information of the corresponding connection; when the number of packets of a certain network connection reaches a set scale, processing the relevant characteristics of the corresponding number of data packets belonging to the connection stored at present into a group of detection inputs and putting the detection inputs into a cache pool; the detection process then consumes the sets of detection inputs in the cache pool continuously, and uses the detection model to detect them.
2. The method for detecting the mining flow of the virtual currency based on deep learning according to claim 1, wherein in the step (1), the mining flow is from the virtual currency, and the data flow of the network connection in each mining process is captured through a tool Wireshark, and each network connection lasts for 1 hour; normal traffic comes from daily network usage, with data sizes 8-15 times greater than mine-mining traffic.
3. The method for detecting a flow rate of virtual currency mining based on deep learning according to claim 1, wherein in the step (2), a format of the detection input is as follows:
wherein, T represents the time sequence difference between the current packet and the previous data packet in the same direction, S represents the packet length of the data packet; in and out represent incoming and outgoing traffic, respectively, and are determined based on the source address and destination address of each packet.
4. A method for detecting a flow rate of virtual currency mining based on deep learning according to claim 3, wherein in training the detection model, for one data stream, each group of detection inputs sequentially takes N data packets on each feature in each direction, so that each detection input complies with a two-dimensional matrix format of 4×n, and an insufficient number of features are filled with 0; each feature of the next set of detection inputs starts from the next position adjacent to the last data currently input each feature until any feature is consumed by the detection model.
5. The method for detecting the mining flow of the virtual currency based on deep learning according to claim 1, wherein in the training process of the detection model, for each input sample in the training set, if the input sample belongs to the mining flow, a label with a value of 1 is used for identification, otherwise if the input sample belongs to the normal behavior flow, the value of the label used is 0;
then estimating a loss value using a classification cross entropy function, wherein before calculating the loss, a sigmoid function is required to map the output obtained by each input in the detection model to a section of (0, 1); the training process of loss function minimization uses Adam optimizers to optimize network node values.
CN202211325209.6A 2022-10-27 2022-10-27 Virtual currency mining flow detection method based on deep learning Active CN116208356B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211325209.6A CN116208356B (en) 2022-10-27 2022-10-27 Virtual currency mining flow detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211325209.6A CN116208356B (en) 2022-10-27 2022-10-27 Virtual currency mining flow detection method based on deep learning

Publications (2)

Publication Number Publication Date
CN116208356A CN116208356A (en) 2023-06-02
CN116208356B true CN116208356B (en) 2023-09-29

Family

ID=86511902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211325209.6A Active CN116208356B (en) 2022-10-27 2022-10-27 Virtual currency mining flow detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN116208356B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116996278B (en) * 2023-07-21 2024-01-19 广东技术师范大学 Webpage detection method and device based on mining behavior of WASM module
CN118631589B (en) * 2024-08-09 2024-10-11 四川云互未来科技有限公司 Network traffic supervision abnormality identification early warning method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102821002A (en) * 2011-06-09 2012-12-12 中国移动通信集团河南有限公司信阳分公司 Method and system for network flow anomaly detection
CN107092862A (en) * 2017-03-16 2017-08-25 浙江零跑科技有限公司 A kind of track edge detection method based on convolutional neural networks
CN109120610A (en) * 2018-08-03 2019-01-01 上海海事大学 A kind of fusion improves the intrusion detection method of intelligent ant colony algorithm and BP neural network
WO2019042139A1 (en) * 2017-08-29 2019-03-07 京东方科技集团股份有限公司 Image processing method, image processing apparatus, and a neural network training method
WO2020156348A1 (en) * 2019-01-31 2020-08-06 青岛理工大学 Structural damage identification method based on ensemble empirical mode decomposition and convolution neural network
WO2021114231A1 (en) * 2019-12-11 2021-06-17 中国科学院深圳先进技术研究院 Training method and detection method for network traffic anomaly detection model
WO2022110027A1 (en) * 2020-11-27 2022-06-02 Boe Technology Group Co., Ltd. Computer-implemented image-processing method, image-enhancing convolutional neural network, and computer product

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102821002A (en) * 2011-06-09 2012-12-12 中国移动通信集团河南有限公司信阳分公司 Method and system for network flow anomaly detection
CN107092862A (en) * 2017-03-16 2017-08-25 浙江零跑科技有限公司 A kind of track edge detection method based on convolutional neural networks
WO2019042139A1 (en) * 2017-08-29 2019-03-07 京东方科技集团股份有限公司 Image processing method, image processing apparatus, and a neural network training method
CN109120610A (en) * 2018-08-03 2019-01-01 上海海事大学 A kind of fusion improves the intrusion detection method of intelligent ant colony algorithm and BP neural network
WO2020156348A1 (en) * 2019-01-31 2020-08-06 青岛理工大学 Structural damage identification method based on ensemble empirical mode decomposition and convolution neural network
WO2021114231A1 (en) * 2019-12-11 2021-06-17 中国科学院深圳先进技术研究院 Training method and detection method for network traffic anomaly detection model
WO2022110027A1 (en) * 2020-11-27 2022-06-02 Boe Technology Group Co., Ltd. Computer-implemented image-processing method, image-enhancing convolutional neural network, and computer product

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Dipti Srinivasan ; Xin Jin ; Ruey Long Cheu.Evaluation of Adaptive Neural Network Models for Freeway Incident Detection.《 IEEE transactions on intelligent transportation systems》.2004,第1-11页. *
Hybrid intrusion detection model based on a designed autoencoder;Hou Yuluo;《 Journal of Ambient Intelligence and Humanized Computing》;第10799-10809页 *
基于循环神经网络的恶意软件行为检测技术研究;崔文杰;《中国优秀硕士学位论文全文数据库 信息科技辑》;全文 *

Also Published As

Publication number Publication date
CN116208356A (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN116208356B (en) Virtual currency mining flow detection method based on deep learning
CN105208037B (en) A kind of DoS/DDoS attack detectings and filter method based on lightweight intrusion detection
Zhu et al. Network anomaly detection and identification based on deep learning methods
KR102120214B1 (en) Cyber targeted attack detect system and method using ensemble learning
CN101635658B (en) Method and system for detecting abnormality of network secret stealing behavior
Medhat et al. A new static-based framework for ransomware detection
CN111107096A (en) Web site safety protection method and device
Wang et al. Comprehensive evaluation of machine learning countermeasures for detecting microarchitectural side-channel attacks
Zhang et al. Early detection of host-based intrusions in Linux environment
CN111049828B (en) Network attack detection and response method and system
Kajal et al. A hybrid approach for cyber security: improved intrusion detection system using Ann-Svm
Mythreya et al. Prediction and prevention of malicious URL using ML and LR techniques for network security: machine learning
CN116846633A (en) Network threat monitoring and analyzing method and system based on artificial intelligence
Zheng et al. Cryptocurrency malware detection in real-world environment: Based on multi-results stacking learning
Kunku et al. Ransomware Detection and Classification using Machine Learning
Sekar et al. Prediction of distributed denial of service attacks in SDN using machine learning techniques
CN115987687B (en) Network attack evidence obtaining method, device, equipment and storage medium
CN114024748B (en) Efficient Ethernet traffic identification method combining active node library and machine learning
CN109951484A (en) The test method and system attacked for machine learning product
Nocera et al. A user behavior analytics (uba)-based solution using lstm neural network to mitigate ddos attack in fog and cloud environment
Azeroual et al. A framework for implementing an ml or dl model to improve intrusion detection systems (ids) in the ntma context, with an example on the dataset (cse-cic-ids2018)
Valavan et al. Network Intrusion Detection System Based on Information Gain with Deep Bidirectional Long Short-Term Memory.
Mohi-Ud-Din et al. NIDS: Random Forest Based Novel Network Intrusion Detection System for Enhanced Cybersecurity in VANET's
Djenna et al. PARCA: Proactive Anti-Ransomware Cybersecurity Approach
Zhang et al. An automatic approach for scoring vulnerabilities in risk assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant