CN114595454B - Malicious JS script detection method based on mixed analysis and feature fusion - Google Patents

Malicious JS script detection method based on mixed analysis and feature fusion Download PDF

Info

Publication number
CN114595454B
CN114595454B CN202210252529.7A CN202210252529A CN114595454B CN 114595454 B CN114595454 B CN 114595454B CN 202210252529 A CN202210252529 A CN 202210252529A CN 114595454 B CN114595454 B CN 114595454B
Authority
CN
China
Prior art keywords
malicious
javascript
feature fusion
sequence
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210252529.7A
Other languages
Chinese (zh)
Other versions
CN114595454A (en
Inventor
孙聪
乔新博
陈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210252529.7A priority Critical patent/CN114595454B/en
Publication of CN114595454A publication Critical patent/CN114595454A/en
Application granted granted Critical
Publication of CN114595454B publication Critical patent/CN114595454B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/425Lexical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides a malicious JavaScript detection method based on mixed analysis and feature fusion, which is used for solving the technical problem of lower detection precision in the prior art, and comprises the following implementation steps: (1) obtaining a training sample set and a test sample set; (2) Constructing a malicious JavaScript detection network model based on feature fusion; (3) Performing iterative training on a malicious JavaScript detection network model based on feature fusion; and (4) obtaining a detection result of the malicious JavaScript script. According to the method, the malicious JavaScript script detection network model based on feature fusion is used for fusing dynamic and static features and classifying, so that the problem that the prior art directly splices the sequence information among the damaged features caused by the fact that the dynamic and static features are input into the random forest algorithm model is avoided, and the detection precision of the malicious JavaScript script is effectively improved.

Description

Malicious JS script detection method based on mixed analysis and feature fusion
Technical Field
The invention belongs to the technical field of information safety, relates to a malicious JavaScript detection method, and relates to a malicious JavaScript detection method specifically based on mixed analysis and feature fusion, which can be used for detecting malicious JavaScript subjected to complex confusion protection.
Background
The JavaScript script serves as one of the most popular front-end scripts in the world, plays an important role in the internet, and operates the browser by using an Application Program Interface (API), so that the JavaScript is used for optimizing an interface, verifying form data, checking browser information, responding to browser operation, controlling login credentials and the like.
The dynamic characteristic of the JavaScript script greatly simplifies the development work of the front end of the browser. Firstly, the JavaScript script has cross-platform property, only needs the support of a browser, and does not depend on an operating system. Secondly, the JavaScript script has dynamic characteristics, and the JavaScript script is simple and flexible in grammar and is very suitable for complex and changeable browser tasks. In addition, javaScript script is an explanatory language that can be executed while being interpreted without precompiled.
While JavaScript script cross-platform and dynamic have great advantages in browser front-end development, it is one of the main carriers of hacking. Malicious attacks such as Drive-by-Download attacks, cross site scripting attacks (XSS), heap injection attacks (Heap spraying attacks), click hijacking attacks (ClickJacking), etc. The attacks can steal the data of the user, create malicious worms capable of copying by themselves, control the browser of the user to download malicious software, and pose a great threat to the information security of the Internet user. And the JavaScript script also needs to call a different API in order to complete a malicious attack. Therefore, it has become an important task to study how to accurately and effectively detect malicious JavaScript.
At present, detection technologies for malicious JavaScript scripts are mainly divided into detection technologies based on static analysis, detection technologies based on dynamic analysis and detection technologies based on mixed analysis. The detection technology based on static analysis mainly utilizes static analysis technologies such as source code analysis, lexical analysis, grammar tree analysis, and the like to analyze JavaScript scripts without running a program, extracts static characteristics of the JavaScript scripts, and utilizes the static characteristics to detect malicious JavaScript scripts. The detection technology based on dynamic analysis mainly utilizes dynamic instrumentation, sandbox execution and other modes to extract dynamic characteristics of the JavaScript script, such as file read-write times, function call, variable value tracking and other information, and judges the maliciousness of the JavaScript script through the dynamic characteristics. The detection technology based on the mixed analysis not only uses static analysis to extract static characteristics, but also uses dynamic analysis to extract dynamic characteristics, and comprehensively judges the maliciousness of the JavaScript script through the static characteristics and the dynamic characteristics.
Along with the development of machine learning technology and deep learning technology, convolutional neural network CNN, two-way long-short-term memory neural network BiLSTM and random forest algorithm model are also used in malicious JavaScript detection. CNN is a feedforward neural network with a depth structure and containing convolution calculation, and CNN can consider the spatial distribution of input so as to capture the sequence information among input features; the BiLSTM is a time-cycled neural network, and the input of the LSTM layer not only comprises the output of the input layer, but also comprises the output of the LSTM layer at the last moment, and can capture the sequence information among input features; the random forest algorithm is an algorithm for integrating a plurality of decision trees through an integrated learning idea, and features are randomly selected in the process of generating a random forest, so that sequence information among adjacent features is destroyed.
He Xincheng, xu Lei et al in paper Malicious JavaScript Code Detection Based on Hybrid Analysis (APSEC), 2018, pp.365-374 disclose a detection method of malicious JavaScript by mixed analysis, the method comprises the steps of collecting relevant webpage source codes, and extracting JavaScript in the source codes and JavaScript embedded in an HTML document; secondly, constructing an abstract syntax tree for each JavaScript script, analyzing nodes in each abstract syntax tree and extracting features from the nodes, wherein the extracted static features comprise: 13 features including number of coding operations, number of redirection operations, number of spaces, total number of lines of code, etc.; then, inserting piles for each JavaScript script, overwriting basic operation in operation to be monitored, and counting information of the JavaScript script in operation as dynamic characteristics, wherein the extracted dynamic characteristics comprise 12 characteristics such as the reading and writing times of an object, the times of binary operation and the like; then, the dynamic and static features are rewritten into feature vectors, and the feature vectors are used as a training set and a testing set to train a random forest algorithm model; and finally, testing a random forest algorithm model through a test set and evaluating indexes.
The method has the defects that firstly, the extracted dynamic and static characteristics are the frequency characteristics of a certain identifier and the occurrence times of a certain operation, which belong to the frequency characteristics of the JavaScript script, the sequential information of the certain operation and the certain identifier in the JavaScript script is ignored, and the code structure information of the JavaScript script is lost; and secondly, the characteristic fusion mode of the method is to directly splice dynamic and static characteristics obtained by mixed analysis before the random forest algorithm model is input, and the characteristic fusion mode is more applicable to the method, but the random forest algorithm model cannot directly analyze sequence information among the characteristics, and the sequence information among the characteristics can be influenced by the directly spliced characteristics. For the above reasons, this method also has a drawback in detection accuracy.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a malicious JS script detection method based on mixed analysis and feature fusion, which is used for solving the technical problem of lower detection precision in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the invention comprises the following steps:
(1) Acquiring a training sample set and a test sample set:
(1a) Obtaining V JavaScript scripts J= { F v V is more than or equal to 1, and each JavaScript F is operated v For running F v Dynamically monitoring the action of calling the application program interface API to obtain an API calling sequence S corresponding to J 1 ={W 1 v V is equal to or more than 1 and is equal to or less than V, and simultaneously, F is v Performing static grammar analysis to obtain abstract grammar tree AST v Accessing an AST through a depth-first traversal algorithm v To obtain a syntax element sequence S corresponding to J 2 ={W 2 v V is equal to or more than 1 and is equal to or less than V, wherein V is more than 10000, F v Representing the v-th JavaScript footBook, W 1 v 、W 2 v Respectively represent F v A corresponding API call sequence, a grammar unit sequence;
(1b) For each API call sequence W, a Word2Vec model is utilized 1 v Syntax element sequence W 2 v Conversion to S 1 API call sequence word vector of (c)S 2 Is>
(1c) For each API call sequence word vectorAnd per syntax element sequence word vector +.>Labeling by the same tag and adding T 1 More than half of API call sequence word vectors and T 2 More than half of grammar unit sequence word vectors and labels shared by every two vectors form a training sample set Q 1 Then T is taken 1 The remaining API call sequence word vector sum T 2 The rest grammar unit sequence word vectors in the list and the labels common to every two vectors form a test sample set Q 2
(2) Constructing a malicious JavaScript detection network model based on feature fusion;
constructing a malicious JavaScript detection network model comprising a convolutional neural network CNN and a two-way long-short-term memory neural network BiLSTM which are arranged in parallel, wherein the output ends of the CNN and the BiLSTM are connected with a feature fusion module and a feature classifier E in sequence; wherein the CNN comprises a convolution layer and a maximum pooling layer which are overlapped with each other, and the activation function of a convolution kernel in the convolution layer is relu; the BiLSTM comprises a forward LSTM layer and a backward LSTM layer which are overlapped with each other, and an activating function of an LSTM unit in the LSTM layer is sigmoid; e comprises a full connection layer and a sigmoid activation function output layer;
(3) Performing iterative training on a malicious JavaScript detection network model based on feature fusion;
(3a) Initializing iteration number as I, maximum iteration number as I, I being more than or equal to 200, and detecting weight matrix of network model based on malicious JavaScript of feature fusion as followsThe offset matrix is +.>Let i=0;
(3b) Will train sample set Q 1 As the malicious JavaScript script based on feature fusion detects the input of the network model to forward spread, the CNN extracts the high-dimensional features of the grammar unit sequence in each training sample; meanwhile, biLSTM extracts high-dimensional characteristics of the API call sequence in each training sample; the feature fusion module fuses each high-dimensional feature extracted by the CNN with the corresponding high-dimensional feature extracted by the BiLSTM, and the feature classifier E maps each fused high-dimensional feature into a vector and inputs sigmoid to obtain the prediction probability of each training sample
(3c) By cross entropy loss functionCalculating a loss value L of a malicious JavaScript detection network model based on feature fusion in the iteration i The method comprises the steps of carrying out a first treatment on the surface of the By back-propagation and by loss value L i Calculating weight matrix gradient of malicious JavaScript detection network model based on feature fusion>Offset matrix gradient->By gradient descent method>Weight matrix->And offset matrix->Updating;
(3d) Judging whether I & gtI is true or not, if yes, obtaining a trained malicious JavaScript detection model based on feature fusion, otherwise, enabling i=i+1, and executing the step (3 b);
(4) Obtaining a detection result of a malicious JavaScript script:
test sample set Q 2 Forward propagation is carried out as the input of a trained malicious JavaScript detection model based on feature fusion, and the prediction probability of each test sample is obtainedIf->Then Q 2 The JavaScript script corresponding to the kth training sample is malicious, otherwise, the JavaScript is normal.
Compared with the prior art, the invention has the following advantages:
1. the malicious JavaScript detection network model constructed by the invention comprises CNN and BiLSTM which are arranged in parallel and a feature fusion module which is connected with the output end of the CNN and BiLSTM in sequence, wherein in the process of training the model and acquiring the detection result of the malicious JavaScript, the convolution layer of the CNN can capture the space sequence information of a grammar unit with a certain length through convolution of a convolution kernel; the forward LSTM layer and the backward LSTM layer of the BiLSTM can capture API space sequence information with a certain length; the CNN and the BiLSTM respectively process different sequences, so that the defect that the prior art directly splices sequence information among damaged features caused by dynamic and static feature input random forest algorithm models is avoided, and the detection accuracy of malicious JavaScript scripts is effectively improved.
2. According to the method, an API call sequence obtained by dynamically monitoring the running behavior of calling an application program interface API of each JavaScript script is converted into a grammar unit sequence obtained by carrying out static grammar analysis on each JavaScript script, and a training sample set containing dynamic information and static information is obtained. The API call sequence and the grammar unit sequence in the training sample set are provided with sequence information, so that the defect that the prior art only extracts the frequency characteristic of the JavaScript script, loses the code sequence information in the JavaScript script, and further improves the detection precision of the malicious JavaScript script.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
Detailed Description
The invention is described in further detail below with reference to the attached drawings and to specific embodiments:
referring to fig. 1, the present invention includes the steps of:
step 1) obtaining a training sample set and a test sample set:
(1a) Obtaining V JavaScript scripts J= { F from the Internet by utilizing a web crawler v Modifying a sandbox realizing the JavaScript script API, adding a monitoring function into the code of the JavaScript script API in the sandbox, and running each JavaScript script F by using the modified sandbox v When F v When calling the API, the monitoring function in the API code is also called, the monitoring function records the called API name, and an API calling sequence S corresponding to J is obtained 1 ={W 1 v V is equal to or more than 1 and is equal to or less than V, and meanwhile, the Esprima tool is used for F v Performing static grammar analysis to obtain abstract grammar tree AST v Accessing an AST through a depth-first traversal algorithm v To obtain a syntax element sequence S corresponding to J 2 ={W 2 v V is equal to or greater than 1 and is equal to or less than V, wherein V=36000 and F v Represents the v-th JavaScript, W 1 v 、W 2 v Respectively represent F v A corresponding API call sequence, a grammar unit sequence;
(1b) Word2Vec is a model for Word vector conversion, and the Word2Vec model comprises an input layer, a hidden layer and an output layer which are connected in sequence, wherein the activation function of the output layer is a softmax function. For each API call sequence W, a Word2Vec model is utilized 1 v Syntax element sequence W 2 v Conversion to S 1 API call sequence word vector of (c)S 2 Is>
(1c) For each API call sequence word vectorAnd per syntax element sequence word vector +.>The reason for the marking by the same tag is +.>And->All from the same JavaScript script F v Thus->Andshould be marked by the same tag and T 1 80% of API call sequence word vectors and T 2 More than half of grammar unit sequence word vectors and labels shared by every two vectors form a training sample set Q 1 ThenWill T 1 The remaining 20% of API call sequence word vectors and T 2 The rest grammar unit sequence word vectors in the list and the labels common to every two vectors form a test sample set Q 2
Step 2) constructing a malicious JavaScript detection network model based on feature fusion;
constructing a malicious JavaScript detection network model comprising a convolutional neural network CNN and a two-way long-short-term memory neural network BiLSTM which are arranged in parallel, wherein the output ends of the CNN and the BiLSTM are connected with a feature fusion module and a feature classifier E in sequence; the CNN comprises a convolution layer and a maximum pooling layer which are overlapped with each other, wherein the number of convolution kernels in the convolution layer is 4, the sizes of the convolution kernels are 7, 15, 35 and 55 respectively, and the activation function of each convolution kernel is relu; the BiLSTM comprises a forward LSTM layer and a backward LSTM layer which are overlapped with each other, wherein the number of LSTM units in the forward LSTM layer and the backward LSTM layer is 128, and the activating function of the LSTM units in the LSTM layer is sigmoid; e comprises a full connection layer and a sigmoid activation function output layer;
step 3) carrying out iterative training on a malicious JavaScript detection network model based on feature fusion;
(3a) Initializing iteration number as I, maximum iteration number as I, I being more than or equal to 200, and detecting weight matrix of network model based on malicious JavaScript of feature fusion as followsThe offset matrix is +.>Let i=0;
(3b) Will train sample set Q 1 As the malicious JavaScript script based on feature fusion detects the input of the network model to forward propagate, the convolution kernels with different sizes in the convolution layer of CNN convolve along the grammar unit sequence, the maximum pooling layer reduces the dimension of the output result of the convolution layer, while the output result of the pooling layer is used as the high-dimension feature of the grammar unit sequence in each training sample; while the forward LSTM layer in BiLSTM processes the API call sequence, the output of the forward LSTM is used as the postInputs to the LSTM layer and outputs to the LSTM layer as high-dimensional features of the API call sequence in each training sample; the feature fusion module fuses each high-dimensional feature extracted by the CNN with the corresponding high-dimensional feature extracted by the BiLSTM, the high-dimensional features from the CNN and the high-dimensional features from the BiLSTM are spliced, and the feature classifier E maps each fused high-dimensional feature into a vector and inputs sigmoid to obtain the prediction probability of each training sample
(3c) Using cross entropy loss functionsAnd pass->Calculating a loss value L of a malicious JavaScript detection network model based on feature fusion in the iteration i The method comprises the steps of carrying out a first treatment on the surface of the By back-propagation and by loss value L i Calculating weight matrix gradient of malicious JavaScript script detection network model based on feature fusionOffset matrix gradient->By gradient descent method>Weight matrix->And offset matrix->Updating, wherein the updating formula of the weight matrix is +.>The update formula of the offset matrix is as follows
(3d) Judging whether I & gtI is true or not, if yes, obtaining a trained malicious JavaScript detection model based on feature fusion, otherwise, enabling i=i+1, and executing the step (3 b);
step 4) obtaining a detection result of the malicious JavaScript script:
test sample set Q 2 Forward propagation is carried out as the input of a trained malicious JavaScript detection model based on feature fusion, and the prediction probability of each test sample is obtainedIf->Then Q 2 The JavaScript script corresponding to the kth training sample is malicious, otherwise, the JavaScript is normal.
In order to verify the technical effect of the malicious JS script detection method based on mixed analysis and feature fusion, the detection precision of the method is simulated, 18000 normal JavaScript scripts are extracted from a website with the click rate of 20, 18000 malicious JavaScript samples are obtained from a malicious software analysis platform, 18000 normal JavaScript scripts and 18000 malicious JavaScript scripts are mixed randomly to form 36000 JavaScript scripts as a JavaScript script set, 36000 grammar unit sequences and API call sequences are obtained by using a mixed analysis method, 36000 grammar unit sequences and API call sequences are converted into Word vector representations by using a Word2Vec model, and marking is carried out according to 8:2 to divide the training set and the test set.
According to the invention, a malicious JavaScript detection network model based on feature fusion is built on a host with a memory of 8G and an operating system of Ubuntu 18.04 by applying a Python3.6 and kersa deep learning framework, and the detection model is trained by utilizing a training set to obtain a trained detection model. The detection model is tested by using the test set as input, and the detection accuracy rate of the invention is 99.73%. The average accuracy of the prior art is 97.8%, which is 1.9% higher than that of the prior art, and the malicious JS script detection method based on mixed analysis and feature fusion provided by the invention is proved to be superior to the prior art.

Claims (4)

1. A malicious JS script detection method based on mixed analysis and feature fusion is characterized by comprising the following steps:
(1) Acquiring a training sample set and a test sample set:
(1a) Obtaining V JavaScript scripts J= { F v V is more than or equal to 1, and each JavaScript F is operated v For running F v Dynamically monitoring the action of calling the application program interface API to obtain an API calling sequence S corresponding to J 1 ={W 1 v V is equal to or more than 1 and is equal to or less than V, and simultaneously, F is v Performing static grammar analysis to obtain abstract grammar tree AST v Accessing an AST through a depth-first traversal algorithm v To obtain a syntax element sequence corresponding to JWherein V is more than 10000, F v Represents the v-th JavaScript, W 1 v 、/>Respectively represent F v A corresponding API call sequence, a grammar unit sequence;
(1b) For each API call sequence W, a Word2Vec model is utilized 1 v Syntax element sequenceConversion to S 1 API call sequence word vector T 1 ={X 1 v |1≤v≤V}、S 2 Is>
(1c) Calling sequence word vector X for each API 1 v And each syntax element sequence word vectorLabeling by the same tag and adding T 1 More than half of API call sequence word vectors and T 2 More than half of grammar unit sequence word vectors and labels shared by every two vectors form a training sample set Q 1 Then T is taken 1 The remaining API call sequence word vector sum T 2 The rest grammar unit sequence word vectors in the list and the labels common to every two vectors form a test sample set Q 2
(2) Constructing a malicious JavaScript detection network model based on feature fusion;
constructing a malicious JavaScript detection network model comprising a convolutional neural network CNN and a two-way long-short-term memory neural network BiLSTM which are arranged in parallel, wherein the output ends of the CNN and the BiLSTM are connected with a feature fusion module and a feature classifier E in sequence; wherein the CNN comprises a convolution layer and a maximum pooling layer which are overlapped with each other, and the activation function of a convolution kernel in the convolution layer is relu; the BiLSTM comprises a forward LSTM layer and a backward LSTM layer which are overlapped with each other, and an activating function of an LSTM unit in the LSTM layer is sigmoid; e comprises a full connection layer and a sigmoid activation function output layer;
(3) Performing iterative training on a malicious JavaScript detection network model based on feature fusion;
(3a) Initializing iteration number as I, maximum iteration number as I, I being more than or equal to 200, and detecting weight matrix of network model based on malicious JavaScript of feature fusion as followsThe offset matrix is +.>Let i=0;
(3b) Will train sample set Q 1 As the malicious JavaScript script based on feature fusion detects the input of the network model to forward spread, the CNN extracts the high-dimensional features of the grammar unit sequence in each training sample; meanwhile, biLSTM extracts high-dimensional characteristics of the API call sequence in each training sample; the feature fusion module fuses each high-dimensional feature extracted by the CNN with the corresponding high-dimensional feature extracted by the BiLSTM, and the feature classifier E maps each fused high-dimensional feature into a vector and inputs sigmoid to obtain the prediction probability of each training sample
(3c) By cross entropy loss functionCalculating a loss value L of a malicious JavaScript detection network model based on feature fusion in the iteration i The method comprises the steps of carrying out a first treatment on the surface of the By back-propagation and by loss value L i Calculating weight matrix gradient of malicious JavaScript detection network model based on feature fusion>Offset matrix gradient->By gradient descent methodWeight matrix->And offset matrix->Updating;
(3d) Judging whether I & gtI is true or not, if yes, obtaining a trained malicious JavaScript detection model based on feature fusion, otherwise, enabling i=i+1, and executing the step (3 b);
(4) Obtaining a detection result of a malicious JavaScript script:
test sample set Q 2 Forward propagation is carried out as the input of a trained malicious JavaScript detection model based on feature fusion, and the prediction probability of each test sample is obtainedIf->Then Q 2 The JavaScript script corresponding to the kth training sample is malicious, otherwise, the JavaScript is normal.
2. The malicious JS script detection method based on hybrid analysis and feature fusion of claim 1, wherein the Word2Vec model in step (1 b) includes an input layer, a hidden layer and an output layer connected in sequence, and an activation function of the output layer is a softmax function.
3. The malicious JS script detection method based on hybrid analysis and feature fusion of claim 1, wherein the malicious JavaScript script detection network model in step (2), wherein: the convolutional neural network CNN comprises 4 convolutional kernels in the convolutional layers, and the sizes of the convolutional kernels are 7, 15, 35 and 55 respectively; the number of LSTM units in the forward LSTM layer and the backward LSTM layer contained in the bi-directional long-short-term memory neural network BiLSTM is 128.
4. The malicious JS script detection method based on hybrid analysis and feature fusion according to claim 1, wherein the malicious JavaScript script detection network model in step (3 c) has a loss value L i Gradient of weight matrixOffset matrix gradient->Weight matrix->Offset matrix->The update formulas of (a) are respectively as follows:
wherein,the prediction probability of the real label of the mth sample and the prediction probability of the mth training sample are respectively represented, wherein the real label is 1, the sample is malicious, the real label is 0, the sample is normal, and the sample is->Representing the updated weight matrix of the model,representing the weight matrix before update, +.>Representing the updated offset matrix,/>Representing the offset matrix before update, alpha 1 Representation parameters->Is a of the learning rate of (a) 2 Representation parameters->Is a learning rate of (a).
CN202210252529.7A 2022-03-11 2022-03-11 Malicious JS script detection method based on mixed analysis and feature fusion Active CN114595454B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210252529.7A CN114595454B (en) 2022-03-11 2022-03-11 Malicious JS script detection method based on mixed analysis and feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210252529.7A CN114595454B (en) 2022-03-11 2022-03-11 Malicious JS script detection method based on mixed analysis and feature fusion

Publications (2)

Publication Number Publication Date
CN114595454A CN114595454A (en) 2022-06-07
CN114595454B true CN114595454B (en) 2024-04-02

Family

ID=81818292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210252529.7A Active CN114595454B (en) 2022-03-11 2022-03-11 Malicious JS script detection method based on mixed analysis and feature fusion

Country Status (1)

Country Link
CN (1) CN114595454B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663296A (en) * 2012-03-31 2012-09-12 杭州安恒信息技术有限公司 Intelligent detection method for Java script malicious code facing to the webpage
CN107180192A (en) * 2017-05-09 2017-09-19 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
KR102120200B1 (en) * 2019-12-27 2020-06-17 주식회사 와이햇에이아이 Malware Crawling Method and System
CN111523117A (en) * 2020-04-10 2020-08-11 西安电子科技大学 Android malicious software detection and malicious code positioning system and method
CN112685738A (en) * 2020-12-29 2021-04-20 武汉大学 Malicious confusion script static detection method based on multi-stage voting mechanism

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080016339A1 (en) * 2006-06-29 2008-01-17 Jayant Shukla Application Sandbox to Detect, Remove, and Prevent Malware

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663296A (en) * 2012-03-31 2012-09-12 杭州安恒信息技术有限公司 Intelligent detection method for Java script malicious code facing to the webpage
CN107180192A (en) * 2017-05-09 2017-09-19 北京理工大学 Android malicious application detection method and system based on multi-feature fusion
KR102120200B1 (en) * 2019-12-27 2020-06-17 주식회사 와이햇에이아이 Malware Crawling Method and System
CN111523117A (en) * 2020-04-10 2020-08-11 西安电子科技大学 Android malicious software detection and malicious code positioning system and method
CN112685738A (en) * 2020-12-29 2021-04-20 武汉大学 Malicious confusion script static detection method based on multi-stage voting mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于混合分析的 JavaScript 恶意代码检测方法;乔新博;知网研学;20220401;文章第3、4章 *
面向数字货币特征的细粒度代码注入攻击检测;孙聪 等;计算机研究与发展;20210514;第58卷(第5期);第1035-1044页 *

Also Published As

Publication number Publication date
CN114595454A (en) 2022-06-07

Similar Documents

Publication Publication Date Title
CN108334781B (en) Virus detection method, device, computer readable storage medium and computer equipment
CN104881608B (en) A kind of XSS leak detection methods based on simulation browser behavior
CN104881607B (en) A kind of XSS leakage locations based on simulation browser behavior
CN111797407B (en) XSS vulnerability detection method based on deep learning model optimization
CN107944274A (en) A kind of Android platform malicious application off-line checking method based on width study
CN107360137A (en) Construction method and device for the neural network model of identifying code identification
CN110502897A (en) A kind of identification of webpage malicious JavaScript code and antialiasing method based on hybrid analysis
CN110287702A (en) A kind of binary vulnerability clone detection method and device
CN111866004B (en) Security assessment method, apparatus, computer system, and medium
CN112307473A (en) Malicious JavaScript code detection model based on Bi-LSTM network and attention mechanism
CN113821804B (en) Cross-architecture automatic detection method and system for third-party components and security risks thereof
CN104715190B (en) A kind of monitoring method and system of the program execution path based on deep learning
CN108229170B (en) Software analysis method and apparatus using big data and neural network
CN112685738B (en) Malicious confusion script static detection method based on multi-stage voting mechanism
CN107491691A (en) A kind of long-range forensic tools Safety Analysis System based on machine learning
CN115033895B (en) Binary program supply chain safety detection method and device
CN115017511A (en) Source code vulnerability detection method and device and storage medium
CN114036531A (en) Multi-scale code measurement-based software security vulnerability detection method
Oz et al. On the use of generative deep learning approaches for generating hidden test scripts
CN112035345A (en) Mixed depth defect prediction method based on code segment analysis
CN115935372A (en) Vulnerability detection method based on graph embedding and bidirectional gated graph neural network
CN115100739A (en) Man-machine behavior detection method, system, terminal device and storage medium
CN106815215A (en) The method and apparatus for generating annotation repository
CN114595454B (en) Malicious JS script detection method based on mixed analysis and feature fusion
CN113971284B (en) JavaScript-based malicious webpage detection method, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant