CN114157481A

CN114157481A - Lightweight encryption hijacking attack detection system based on deep learning

Info

Publication number: CN114157481A
Application number: CN202111457492.3A
Authority: CN
Inventors: 邹福泰; 贺皓涵; 王梓帆; 吴越; 昌洵成
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-03-08

Abstract

The invention discloses a light-weight encryption hijack attack detection system based on deep learning, and relates to the field of computer network security. The method comprises two parts of model training and deployment detection. Aiming at the defects of the existing detection system, the invention uses the deep learning technology to classify and detect the mining program converted into the image. The invention uses the malicious code vectorization technology to add the semantic features of the mining program into the picture, thereby realizing higher classification precision. Meanwhile, good isomerism is guaranteed, and the mining program written by two mainstream languages (JavaScript and WebAssembly) can be detected at the same time. The invention can be deployed to a campus or an enterprise gateway to monitor and detect the daily flow. And adding the detected domain name for mounting the malicious mining script into a blacklist database. The detection system provided by the invention plays a vital role in network defense of campuses and enterprises, and has positive significance in real-time detection and defense of encryption hijacking attack.

Description

Lightweight encryption hijacking attack detection system based on deep learning

Technical Field

The invention relates to the field of computer network security, in particular to a light-weight encryption hijack attack detection system based on deep learning.

Background

Cryptocurrency (cryptocurrence) is a transaction medium created by a transaction entity using the principles of cryptography to secure transactions and control them. Cryptocurrency is one type of digital currency (or virtual currency). A Bitcoin (BTC), which is one of cryptocurrencies, was first proposed by the chinese smarts in 2008, and was formally produced in 2009 in 1 month, which was the first decentralized cryptocurrency in the world. As the earliest cryptocurrency, cryptocurrency was based on a decentralised consensus mechanism, as opposed to bank financial systems relying on centralised regulatory bodies. At the same time, the largest difference between bitcoins and conventional currencies is that they are not issued by the currency institution, but are calculated by an algorithm, the so-called decentralization. Although the bitcoin has no practical application value, the hot tide of the fried bitcoin never stops when the bitcoin is born. The first bitcoin required ten thousand to purchase two pizzas, and after many years of expansion, in 4 months 2021, the price of each bitcoin reached the amazing $ 64000, which equates to renminbi 410000. Under the influence of the tide of the bitcoin, successors of the encrypted currency also lay themselves in the public field of vision if bamboo shoots come out in the spring after rain, and Ethernet coins (ETH), Laite coins (LTC) and the like emerge endlessly and form a huge industry chain. And is driven by great interest. Lawbreakers begin to use encryption hijack attacks to occupy user computer resources on a large scale for mining calculation, so that a large amount of mining money is illegally acquired, and a serious safety problem is caused.

Cryptographic hijacking (cryptojaking) attacks refer to the unauthorized use of other people's computer resources to exploit cryptographic currencies. The attacker infects websites or online advertisements by using JavaScript, Wasm programs. When the victim visits a web site with a mining program. These codes would load into the victim's browser, occupy significant computer resources, and automatically begin mining without the knowledge of the victim. Meanwhile, along with the continuous expansion of the value of the encrypted currency, malicious ore digging software gradually generates air flames and is gradually inflated, and lawless persons start to dig ores on a large scale by a large-scale encryption hijack attack means, so that huge economic loss is caused. Cryptographic hijacking attacks have been devastating to highly developed internet societies, not only are the interests of individual users impaired, but they also involve various websites and government departments. Many encryption hijacking attacks have occurred in 2017 to 2020, involving various websites and government departments. Even in 26/8/2021, the Tencent Security threat information center detects that an attacker injects mining scripts into thousands of hosts, causing serious economic loss. Therefore, a safe, reliable and efficient means for detecting the encryption hijacking attack is very important when the prevailing wind of the encryption hijacking attack exists.

Therefore, those skilled in the art are working on developing a lightweight encryption hijacking attack detection system based on deep learning. A detection method of encryption hijack attack is developed, multiple ore excavation scripts can be detected at the same time, and meanwhile, only a small amount of user computer resources are occupied, so that the purpose of defending the encryption hijack attack is achieved.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, the technical problem to be solved by the present invention is to detect and defend against an increasingly rampant encryption hijacking attack. By adopting a deep learning model lightweight technology and a sample vectorization technology added with program semantics, a large amount of flow can be effectively detected, and good heterogeneous and deception resistance are ensured.

In order to achieve the above object, the present invention provides a light-weight encryption hijacking attack detection system based on deep learning, which comprises model training and deployment detection: the model training comprises collecting malicious excavation script data sets; data cleaning, namely screening out samples suitable for training; vectorizing the sample into a gray scale map; designing and training a deep learning neural network for identifying an ore excavation script; the deployment detection comprises the acquisition of a URL to be detected; acquiring input of a detection system; deducing by using a depth model trained to be convergent; and adding the domain name into a black list and a white list.

Further, the collecting the malicious excavation script data set comprises: obtaining a url list containing JavaScript and WebAssembly scripts from a publicWWW search engine; crawling Javascript and WebAssembly scripts mounted by url; analyzing malicious codes provided by the VirusShare website, and screening out JavaScript and WebAssembly scripts; and acquiring JavaScript and WebAssembly samples from relevant open source projects of github.

Further, the data cleansing includes: deleting repeated samples in the training data set; and marking the sample set through VirusTotal, and selecting a malicious ore excavation script and a normal safety script.

Further, the vectorizing the sample into a gray-scale map comprises the following steps:

step 101: converting the calibrated sample set into a uniform text representation by using a compiling tool and a disassembling tool;

step 102: analyzing text representation, positioning a position section of an excavation semantic part in a malicious excavation sample, and intercepting all samples to the same section;

step 103: and filling the intercepted text representation into pictures with uniform sizes through mapping from numerical values to gray levels, wherein the pictures form a training set of the model.

Further, the designing and training of the deep learning neural network for identifying the mining script comprises: on the basis of the traditional convolutional neural network, a model lightweight technology is added, and model parameters and volume are reduced; training the model by using a training set, and adjusting parameters to improve the prediction performance; after the training curve converges, the model parameters are saved.

Further, the acquiring the URL to be detected includes:

step 201: obtaining a flow log from a gateway;

step 202: extracting the domain name in the flow log, and screening out the domain name which is known to contain or not contain the ore excavation script by using a black and white list;

step 203: and acquiring first-level and second-level subdirectories of the untrusted domain name.

Further, the obtaining inputs of the detection system includes: acquiring a script file loaded by a page according to the URL to be detected; screening out the script tag elements in the javascript file, the wasm module and the html; the samples are converted to a unified representation and vectorized into pictures.

Further, the inferring with the depth model trained to converge comprises: inputting the picture into the trained model, and obtaining a detection result; and marking the domain name to which the script belongs as a mining domain name or a non-mining domain name according to the detection result.

Further, the domain name detected to contain the mining script is added into a blacklist.

Further, the domain name which is detected not to contain the ore mining script is added into a white list.

In the preferred embodiment of the invention, the invention provides a light-weight encryption hijacking attack detection system which comprises two parts of model training and deployment detection. The model training part acquires JavaScipt and WebAssembly samples through different channels, cleans and calibrates the samples, vectorizes the calibrated samples into a gray-scale image with uniform size, and inputs the gray-scale image into the model for training; the deployment detection part is used for collecting the flow log from the gateway, extracting the domain name in the flow log, comparing the domain name with the domain name in the blacklist, further obtaining a subdirectory under the domain name if the domain name is not in the blacklist any more, obtaining JavaScript and WebAssembly scripts mounted under the subdirectory through a crawler, carrying out sample vectorization on the obtained scripts, and inputting the samples into the model for prediction judgment. And determining whether to add the domain name to the black/white list database according to the result.

The model training part of the invention comprises:

1) a sample collection module: obtaining Javascript and WebAssembly samples from a PublicWWW search engine, a VirusShare malicious code sharing website and a Github open source warehouse;

2) the data cleaning and labeling module: and removing the weight of the collected sample, cleaning the sample which is unavailable, and labeling the sample by VirusTotal to obtain an excavated sample and a non-excavated sample.

3) A sample vectorization module: vectorizing the marked samples, adding program semantics in the vectorizing process to improve the precision, and finally forming a gray level image with uniform size through mapping from numerical values to gray levels and filling.

4) Model design and training: in order to ensure light weight, a model lightweight technology is adopted, model parameters and volume are reduced, and the obtained gray level map is input into a designed model. After the training curve converges, the model parameters are saved.

The module in the model training comprises the following steps:

(1) establishing an encryption hijacking attack detection deep learning model;

(1.1) training sample collection:

(1.1a) obtaining url list containing JavaScript and WebAssembly scripts from public WWW search engine.

(1.1b) crawling the Javascript and WebAssembly scripts mounted by the url in the step (1.1).

(1.1c) analyzing malicious codes provided by the VirusShare website and screening out JavaScript and WebAssembly scripts.

(1.1d) obtaining JavaScript and WebAssembly samples from the related open source project of github.

(1.2) data cleaning and labeling:

and (1.2a) deleting the samples which are repeated in the sample set obtained in the step (1.2), (1.3) and (1.4).

And (1.2b) marking the sample set obtained in the step (2.1) through VirusTotal, and selecting a malicious ore mining script and a normal safety script.

(1.3) sample vectorization treatment:

(1.3a) converting the calibrated sample set obtained in the step (2.2) into a unified text representation by using a compiling tool and a disassembling tool.

(1.3b) analyzing the text representation of step (3.1), locating the location section of the mining semantic part in the malicious mining sample, and intercepting all samples to the same section.

(1.3c) the text intercepted in the step (3.2) represents a picture with uniform size through a mapping filling process from numerical values to gray scale. These pictures constitute the training set of the model.

(1.4) model design and training:

(1.4a) on the basis of the traditional convolutional neural network, adding a model lightweight technology to reduce the model parameters and the volume.

(1.4b) training the model by using the training set obtained in the step (3.3), and adjusting parameters appropriately to improve the prediction performance.

(1.4c) after the training curve converges, the model parameters are saved.

The deployment prediction component of the present invention comprises:

1) a URL acquisition module: and analyzing the daily flow log, extracting and filtering the domain name information in the daily flow log, and further obtaining the sub URL path of each domain name.

2) Inputting a system: and acquiring the JavaScript and WebAssembly scripts loaded by the URL page, and vectorizing the scripts into a gray level graph.

3) And (3) depth model inference: and inputting the gray-scale map into a model trained by the model training part to obtain a detection result.

4) Black/white list database: and adding the domain name of the detected ore digging script into a blacklist, and quickly judging through the blacklist and the white list before obtaining the URL path in the step 1).

The module for deployment detection comprises the following steps:

(2) deploying an encryption hijacking attack detection system;

(2.1) acquiring the URL to be detected:

(2.1a) obtaining a traffic log from the gateway;

(2.1b) extracting the domain name in the flow log, and screening out the domain name which is known to contain or not contain the ore digging script by using a black and white list;

and (2.1c) acquiring the first-level subdirectory and the second-level subdirectory of the untrusted domain name.

(2.2) obtaining input of the detection system:

(2.2a) obtaining the URL according to the step (2.1) and acquiring a script file loaded by the page;

(2.2b) screening out the javascript file, the wasm module and the script tag elements in the html;

(2.2c) converting the samples into a unified representation and vectorizing into pictures.

(2.3) depth model inference detection:

(2.3a) inputting the picture obtained in the step (2.2c) into a trained model, and obtaining a detection result;

(2.3b) marking the domain name to which the script belongs as a mining domain name or a non-mining domain name according to the detection result;

(2.4) blacklist/whitelist database:

(2.4a) adding the mining domain name obtained in the step (2.3b) into a blacklist database, and adding the non-mining domain name into a whitelist database;

(2.4b) judging the domain name line to be detected in the step (2.1b) by using a black/white list, and if the domain name line is directly calibrated in the black/white list; if not, the following steps are continued from (2.1 c).

Compared with the prior art, the invention has the following obvious substantive characteristics and obvious advantages:

1. the invention provides a lightweight encryption hijacking attack detection system to detect and resist rampant encryption hijacking attack. The invention adopts a deep learning model lightweight technology and a sample vectorization technology added with program semantics, can effectively detect a large amount of flow and ensure good heterogeneous and deception resistance.

2. The invention can simultaneously detect various ore digging scripts and only occupy a small amount of user computer resources, thereby achieving the purpose of defending against encryption hijacking attack.

3. Aiming at the defects of the existing detection system, the invention uses the deep learning technology to classify and detect the mining program converted into the image. The invention has the innovation that the semantic features of the mining program are added into the picture by using the malicious code vectorization technology, so that higher classification precision is realized. Meanwhile, good isomerism is guaranteed, and the mining program written by two mainstream languages (JavaScript and WebAssembly) can be detected at the same time. The invention has the practicability that the method can be deployed to a campus or an enterprise gateway to monitor and detect the daily flow. And adding the detected domain name for mounting the malicious mining script into a blacklist database. The detection system provided by the invention plays a vital role in network defense of campuses and enterprises, and has positive significance in real-time detection and defense of encryption hijacking attack.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

FIG. 1 is a schematic diagram of a model training process according to a preferred embodiment of the present invention;

FIG. 2 is a deployment detection flow diagram in accordance with a preferred embodiment of the present invention.

Detailed Description

The technical contents of the preferred embodiments of the present invention will be more clearly and easily understood by referring to the drawings attached to the specification. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.

In the drawings, structurally identical elements are represented by like reference numerals, and structurally or functionally similar elements are represented by like reference numerals throughout the several views. The size and thickness of each component shown in the drawings are arbitrarily illustrated, and the present invention is not limited to the size and thickness of each component. The thickness of the components may be exaggerated where appropriate in the figures to improve clarity.

The invention provides a browser mining detection system, which comprises a training deep learning detection module and a deep learning module deployed in the system: the training deep learning module comprises a module for collecting malicious excavation javascript and webassempty scripts: data cleaning, namely screening out samples suitable for training; vectorizing the sample into a gray scale map; designing and training a deep learning neural network for identifying an ore excavation script; the deployment deep learning browser mining detection system comprises a URL to be detected; acquiring input of a detection system; deducing by using a depth model trained to be convergent; and adding the domain name which is inferred to contain the mining script into a black list and a white list.

As shown in fig. 1, the collecting of the excavation script data set includes: obtaining a url list containing JavaScript and WebAssembly scripts from a publicWWW search engine; crawling the Javascript and WebAssembly scripts mounted by the url in the step (1.1); analyzing malicious codes provided by the VirusShare website, and screening out JavaScript and WebAssembly scripts; and acquiring JavaScript and WebAssembly samples from relevant open source projects of github.

The data washing and labeling comprises the following steps: deleting repeated samples in the training data set; and marking the sample set through VirusTotal, and selecting a malicious ore excavation script and a normal safety script.

The sample vectorization processing includes:

step 103: and expressing the text intercepted in the step as a picture with uniform size through a mapping filling process from numerical values to gray levels. These pictures constitute the training set of the model.

The model design and training comprises the following steps: on the basis of the traditional convolutional neural network, a model lightweight technology is added, and model parameters and volume are reduced; training the model by using a training set, and properly adjusting parameters to improve the prediction performance; after the training curve converges, the model parameters are saved.

The acquiring of the URL to be detected comprises:

step 104: obtaining a flow log from a gateway;

step 105: extracting the domain name in the flow log, and screening out the domain name which is known to contain or not contain the ore excavation script by using a black and white list;

step 106: and acquiring first-level and second-level subdirectories of the untrusted domain name.

The input to the acquisition detection system comprises: acquiring a script file loaded by a page according to the URL to be detected; screening out the script tag elements in the javascript file, the wasm module and the html; the samples are converted to a unified representation and vectorized into pictures.

A preferred deployment embodiment of the present invention is comprised of a URL acquisition module, a system input module, a model inference detection module, and a blacklist database.

As shown in fig. 2, the system uses the traffic log of the campus gateway at university a as the original input, and the system first extracts the domain name set to be detected from the traffic log at university a (B). And then comparing the domain name set (B) to be detected with a black/white list in the database. If the domain name is in the black/white list, directly marking the domain name as an ore digging/non-ore digging domain name; and further obtaining the domain name (B') to be detected which is not in the black/white list database. And acquiring a first-level subdirectory URL list and a second-level subdirectory URL list under the domain name (B') to be detected. And (3) obtaining the script file (C) loaded in the URL list through the crawler, and screening out the JavaScript file and the WebAssembly module in the step (D) to form the script (C') to be tested.

And carrying out sample vectorization on the script (C') to be tested to form a gray-scale image with uniform size. And further inputting the gray scale map into the trained depth model in the first attached drawing for inference to obtain an inference result. Classifying the domain names (B ') according to the inference result, and if a script predicted to dig the mine is contained in the script set (C ' -1) mounted on a specific domain name (B ' -1) in the domain names (B '), marking the domain name (C ' -1) as a mine digging domain name and adding the domain name into a blacklist; if all scripts (C ' -1) of the domain name (B ' -1) are determined as normal scripts, the domain name (B ' -1) is added to the white list.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A lightweight encryption hijack attack detection system based on deep learning is characterized by comprising model training and deployment detection: the model training comprises collecting malicious excavation script data sets; data cleaning, namely screening out samples suitable for training; vectorizing the sample into a gray scale map; designing and training a deep learning neural network for identifying an ore excavation script; the deployment detection comprises acquiring a url link to be detected; acquiring input of a detection system; deducing by using a depth model trained to be convergent; and adding the domain name into a black list and a white list.

2. The deep learning-based lightweight encryption hijacking attack detection system of claim 1, wherein the collecting malicious mining script data sets comprises: obtaining a url list containing JavaScript and WebAssembly scripts from a publicWWW search engine; crawling Javascript and WebAssembly scripts mounted by url; analyzing malicious codes provided by the VirusShare website, and screening out JavaScript and WebAssembly scripts; and acquiring JavaScript and WebAssembly samples from relevant open source projects of github.

3. The deep learning-based lightweight encryption hijacking attack detection system of claim 1, wherein the data cleansing comprises: deleting repeated samples in the training data set; and marking the sample set through VirusTotal, and selecting a malicious ore excavation script and a normal safety script.

4. The deep learning-based lightweight encryption hijacking attack detection system of claim 1, wherein vectorizing the samples into a gray scale map comprises the steps of:

5. The deep learning-based lightweight encryption hijacking attack detection system of claim 1, wherein said designing and training a deep learning neural network for identifying mine excavation scripts comprises: on the basis of the traditional convolutional neural network, a model lightweight technology is added, and model parameters and volume are reduced; training the model by using a training set, and adjusting parameters to improve the prediction performance; after the training curve converges, the model parameters are saved.

6. The deep learning-based lightweight encryption hijacking attack detection system of claim 1, wherein said obtaining the URL to be detected comprises:

step 201: obtaining a flow log from a gateway;

7. A deep learning based lightweight cryptographic hijacking attack detection system as recited in claim 1, wherein said obtaining inputs of a detection system comprises: acquiring a script file loaded by a page according to the URL to be detected; screening out the script tag elements in the javascript file, the wasm module and the html; the samples are converted to a unified representation and vectorized into pictures.

8. The deep learning-based lightweight cryptographic hijacking attack detection system of claim 1, wherein said inferring with a depth model trained to converge comprises: inputting the picture into the trained model, and obtaining a detection result; and marking the domain name to which the script belongs as a mining domain name or a non-mining domain name according to the detection result.

9. A deep learning-based lightweight encryption hijacking attack detection system as claimed in claim 1, wherein a domain name detected to contain a mine excavation script is added to a blacklist.

10. The deep learning-based lightweight encryption hijacking attack detection system of claim 1, wherein the domain names detected to contain no mine-mining scripts are added to a white list.