CN112989348B - Attack detection method, model training method, device, server and storage medium - Google Patents

Attack detection method, model training method, device, server and storage medium Download PDF

Info

Publication number
CN112989348B
CN112989348B CN202110408265.5A CN202110408265A CN112989348B CN 112989348 B CN112989348 B CN 112989348B CN 202110408265 A CN202110408265 A CN 202110408265A CN 112989348 B CN112989348 B CN 112989348B
Authority
CN
China
Prior art keywords
code
cfg
dfg
feature vector
attack detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110408265.5A
Other languages
Chinese (zh)
Other versions
CN112989348A (en
Inventor
张宏斌
张尼
许凤凯
薛继东
李末军
鞠奕明
王博闻
孙世豪
李庆科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
6th Research Institute of China Electronics Corp
Original Assignee
6th Research Institute of China Electronics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 6th Research Institute of China Electronics Corp filed Critical 6th Research Institute of China Electronics Corp
Priority to CN202110408265.5A priority Critical patent/CN112989348B/en
Publication of CN112989348A publication Critical patent/CN112989348A/en
Application granted granted Critical
Publication of CN112989348B publication Critical patent/CN112989348B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1458Denial of Service

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Virology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

The application provides an attack detection method, a model training device, a server and a storage medium, wherein the method comprises the following steps: acquiring a JS code to be detected; analyzing the abstract syntax tree of the JS code to obtain the CFG and the DFG of the JS code; converting the CFG and the DFG into a feature vector; and inputting the characteristic vector into a pre-trained attack detection model to obtain the type of the JS code. Therefore, distinguishing and identifying of normal JS codes and JS DDoS attacks of various types can be effectively achieved, and therefore reliable detection of the JS DDoS attacks can be achieved. In addition, the method does not need to implant an additional JS hook program in the Web application, only needs to train an attack detection model on the server, is suitable for various Web applications, has wide scheme application scenes, and does not introduce unsafe factors.

Description

Attack detection method, model training method, device, server and storage medium
Technical Field
The application relates to the technical field of JS DDoS attack processing, in particular to an attack detection method, a model training device, a server and a storage medium.
Background
DDoS (Distributed Denial of Service) refers to that multiple attackers in different positions simultaneously attack one or more targets, or that an attacker controls multiple machines in different positions and uses the machines to attack victims simultaneously. An attacker uses the broiler chicken to send a large amount of requests to a target website in a short time, consumes host resources of a target server in a large scale, and cannot serve the target server normally, so that normal use of a user is influenced, and huge economic loss is caused.
With the increasing application of Web (Web pages), an attack of initiating DDoS by using js (javascript) script is gradually and widely used by attackers. The JS DDoS attack is based on HTTP (Hyper Text Transfer Protocol), creates JS scripts of various types by skillful construction, establishes TCP (Transmission Control Protocol) connection with a target Web server, and continuously submits calls of a large amount of database resources such as queries and lists to a script program, resulting in a large amount of load on the server and refuses to provide normal services for other users. Because the Web users are huge, once being used by an attacker, the adverse effect is generated.
At present, in order to detect JS DDoS attacks, one scheme is to set a JS hook program in advance, detect and judge whether each page of a browser has a DDoS attack by using the hook program, and prevent the DDoS attack from occurring again in a page sent by a Web request. However, this solution requires that a piece of code for additional functionality be embedded in the Web page, since it requires that the JS hook program be used to detect DDOS attacks on the page, which results in that the method can only be applied to certain Web applications that can tamper with the JS hook program embedded. However, for the Web application currently and generally adopting HTTPS (Hyper Text Transfer Protocol over Secure Socket Layer), the JS hook program cannot be tampered and the scheme application is limited. In addition, the solution also increases the extra traffic consumption, introducing unsafe factors.
Disclosure of Invention
An object of the embodiments of the present application is to provide an attack detection method, a model training method, an attack detection device, a model training device, a server, and a storage medium, so as to solve the problems of limited application and insecurity of the existing solutions.
The embodiment of the application provides an attack detection method, which comprises the following steps: acquiring a JS code to be detected; analyzing the abstract syntax tree of the JS code to obtain a CFG (Control Flow Graph) and a DFG (Data Flow Graph) of the JS code; converting the CFG and the DFG into feature vectors; inputting the feature vector into a pre-trained attack detection model to obtain the type of the JS code; the type of the JS code comprises: the corresponding type of JS DDoS attack and the corresponding type of normal JS code.
In the implementation process, the CFG and the DFG of the JS codes are obtained by analyzing the abstract syntax tree of the JS codes, and then the CFG and the DFG are used as the sum core characteristics of the JS codes to realize the classification and identification of the JS codes. Because different control flow and/or data flow characteristics are provided between JS codes of JS DDoS attacks of different types, and the JS codes used for JS DDoS attacks and normal JS codes are provided with different control flow and/or data flow characteristics, the embodiment of the application utilizes CFG and DFG of the JS codes to detect, can effectively realize distinguishing and identifying normal JS codes and JS DDoS attacks of various types, and further can realize reliable detection on JS DDoS attacks. In addition, according to the attack detection method, an extra JS hook program does not need to be implanted into the Web application, only the attack detection model needs to be trained on the server, the method is suitable for various Web applications, the scheme application scene is wide, and the extra JS hook program does not need to be implanted into the Web application, so that extra flow consumption is not increased, and unsafe factors are not introduced.
Further, converting the CFG and the DFG into feature vectors, comprising: traversing the CFG and the DFG to form flow paths of the CFG and the DFG; and carrying out numerical processing on the flow paths of the CFG and the DFG to obtain the feature vectors corresponding to the CFG and the DFG.
In the implementation process, through traversing the CFG and the DFG, flow paths of the CFG and the DFG are formed, and then digitized processing is performed, so that the obtained feature vector can completely and comprehensively reflect the control flow and data flow characteristics of the JS code, and the accuracy of the type of the JS code output in the embodiment of the application is ensured.
Further, after obtaining the type of the JS code, the method further includes: and processing the JS code according to a preset processing strategy corresponding to the type.
In the implementation process, the processing strategies corresponding to the types are preset, so that after the type of the JS code is determined, targeted processing can be achieved, and therefore effective processing of JS DDoS attacks is guaranteed.
The embodiment of the application further provides a model training method, which comprises the following steps: acquiring JS codes of various JS DDoS attacks and normal JS codes; analyzing the abstract syntax tree of each JS code to obtain the CFG and the DFG of each JS code; converting the CFG and the DFG of each JS code into a feature vector, and marking a label of the feature vector of each JS code according to the type of the JS code corresponding to each feature vector; and inputting the feature vector of each JS code and the label of the feature vector of each JS code into a preset attack detection model for training to obtain the trained attack detection model.
Through the attack detection model obtained by the training of the implementation process, the reliability classification of the JS codes can be realized by utilizing the CFG and the DFG of the JS codes. When the JS DDoS attack detection is carried out by adopting the attack detection model obtained by the training of the implementation process, an additional JS hook program does not need to be implanted in the Web application, the scheme is suitable for various Web applications, the application scene is wide, and the additional JS hook program does not need to be implanted in the Web application, so that the additional flow consumption is not increased, and unsafe factors are not introduced.
Further, the attack detection model is a deep learning algorithm model.
In the embodiment of the application, a deep learning algorithm model is adopted as the attack detection model, so that complex feature engineering is not needed during detection processing, and higher identification precision can be achieved.
Further, obtain the JS code of all kinds of JS DDoS attacks, include: acquiring a current existing first JS code of various JS DDoS attacks; according to a preset code generation rule, generating second JS codes of various JS DDoS attacks on the basis of the first JS codes of various JS DDoS attacks; the second JS code includes a plain JS code and/or an obfuscated JS code.
In the actual application process, the currently discovered JS DDoS attacks are fewer in number compared with other DDoS attacks of various types, so that the number of samples aiming at the JS DDoS attacks may be insufficient. Therefore, in the embodiment of the application, content transformation can be performed on the currently existing first JS code based on the preset code generation rule, so that second JS codes of various JS DDoS attacks are generated to meet the sample requirement of model training.
An embodiment of the present application further provides an attack detection apparatus, including: the device comprises a first acquisition module, a first analysis module, a first conversion module and a detection module; the first acquisition module is used for acquiring the JS code to be detected; the first analysis module is used for analyzing the abstract syntax tree of the JS code to obtain a control flow graph CFG and a data flow graph DFG of the JS code; the first conversion module is configured to convert the CFG and the DFG into feature vectors; the detection module is used for inputting the feature vector into a pre-trained attack detection model to obtain the type of the JS code; the type of the JS code comprises: the corresponding type of JS DDoS attack and the corresponding type of normal JS code.
The embodiment of the present application further provides a model training device, including: the system comprises a second acquisition module, a second analysis module, a second conversion module and a training module; the second acquisition module is used for acquiring JS codes of various JS DDoS attacks and normal JS codes; the second analysis module is used for analyzing the abstract syntax tree of each JS code to obtain a control flow graph CFG and a data flow graph DFG of each JS code; the second conversion module is used for converting the CFG and the DFG of each JS code into a feature vector, and marking a label of the feature vector of each JS code according to the type of the JS code corresponding to each feature vector; and the training module is used for inputting each of the feature vectors of the JS codes and each of the labels of the feature vectors of the JS codes into a preset attack detection model for training to obtain the trained attack detection model.
The embodiment of the application also provides a server, which comprises a processor, a memory and a communication bus; the communication bus is used for realizing connection communication between the processor and the memory; the processor is configured to execute one or more programs stored in the memory to implement any of the attack detection methods described above, or to implement any of the model training methods described above.
The present embodiments also provide a readable storage medium, where one or more programs are stored, where the one or more programs are executable by one or more processors to implement any of the above attack detection methods or any of the above model training methods.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a method for training an attack detection model according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of an attack detection method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an internal module hierarchy of a software system implementing aspects of the present application, provided by an embodiment of the present application;
fig. 4 is a schematic structural diagram of an attack detection apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an attack detection model training apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The first embodiment is as follows:
in order to realize reliable detection of JS DDoS attacks, the embodiment of the application provides an attack detection method, and the attack detection model is trained in advance, so that the JS DDoS attacks can be detected by the attack detection model. It should be understood that in the foregoing attack detection method, the training effect of the attack detection model will directly affect the detection reliability of the JS DDoS attack. Therefore, the embodiment of the application also provides a model training method.
In order to facilitate understanding of the scheme of the embodiment of the present application, a model training method provided in the embodiment of the present application is described below.
As shown in fig. 1, fig. 1 is a schematic flow chart of a model training method provided in an embodiment of the present application, including:
s101: and acquiring JS codes of various JS DDoS attacks and normal JS codes.
In the embodiment of the application, classification of JS DDoS attacks can be performed in advance according to the attack characteristics of each JS DDoS attack. For example, in the embodiments of the present application, the following may be classified: direct bandwidth consumption, resource forgery consumption, service performance consumption, protocol stack performance consumption, control resource consumption and infrastructure performance consumption. The classification can be seen in the following table one:
watch 1
Categories Description of the invention
Direct bandwidth consuming classes Initiating a large number of HTTP GET/POST requests directly to a target server
Class of resource forgery consumption Forging HTTP Header information, filling in large amounts of irrelevant data
Class of service performance consumption Requesting an API (Application Program Interface) that requires a large amount of computation or querying of a database, depending on the specific business logic of the target server
Protocol stack performance consumption class Establishing WebSocket connection with target server in short time
Control resource consumption class Requesting a widget resource or requesting a target server resource using a widget
Infrastructure performance consumption class DNS (Domain Name System) server for target server initiates a large number of requests
Normal class Non-offensive JS code
It should be understood that, in the embodiment of the present application, the JS code that causes the JS DDoS attack of the category may be obtained according to the category. And meanwhile, a normal JS code which does not generate JS DDoS attack is obtained.
It should be noted that, in the actual application process, the currently discovered JS DDoS attack is less in number than other types of DDoS attacks, which may cause that the number of samples for the JS DDoS attack may be insufficient at present. Therefore, in the embodiment of the application, on the basis of the existing first JS code, the content of the first JS code can be transformed according to the code generation rule, and second JS codes of various JS DDoS attacks are generated, so that the sample requirement of model training is met.
In the embodiment of the application, code generation rules can be well defined in the existing various JS code generators, so that second JS codes of various JS DDoS attacks are generated based on the existing first JS codes of various JS DDoS attacks.
For example, in the embodiment of the present application, conventional transformation operations such as renaming a local variable, removing a code space, code compression, for example, removing an end part number, constant replacement, and the like may be performed on the existing first JS code. In addition, the first JS code keywords (such as character constants and the like) can be extracted and placed in the array, and the calling mode is configured to be the array subscript mode, so that the code reading difficulty is increased. In addition, the first JS code keyword can be encoded, encrypted and the like. In addition, logic of the control flow can be transformed according to the CFG of the first JS code, and invalid code can be injected randomly by using a code injection mechanism.
It should be understood that, in practical applications, the second JS code can be generated by selecting to use part or all of the operations in the above example or other ways, which are not shown in the above example, for implementing the content transformation on the JS code.
It should be noted that, in the embodiment of the present application, the generated second JS code may be set to be a plain JS code or an obfuscated JS code. In addition, the plaintext JS code and the confused JS code can be generated simultaneously so as to ensure the training effect.
In the embodiment of the application, the generated second JS code is not exactly the same as the first JS code in terms of the syntax structure or variable definition, but the functional logic of the second JS code is identical to that of the first JS code.
It should be further understood that, in the above classification performed after the research by the inventor of the present application, in the actual application process, the JS DDoS attacks may also be classified into other categories according to the attack characteristics of each JS DDoS attack, as long as the classification given by the JS DDoS attack is reasonable, and the classification is not limited in the embodiment of the present application.
S102: and analyzing the abstract syntax tree of each JS code to obtain the CFG and the DFG of each JS code.
In this embodiment of the present application, an AST (Abstract Syntax Tree) may be created for each JS code acquired in step S101, so that a Syntax Tree is constructed to map a statement in the JS code to each node in the AST Tree.
It should be noted that, in the embodiment of the present application, the AST can be constructed by various existing AST construction methods, and the embodiment of the present application is not limited in this embodiment.
After the abstract syntax tree of each JS code is constructed, the abstract syntax tree of each JS code can be analyzed, and the construction of CFG and DFG is realized. It should be understood that the CFG is an abstract representation of a program and represents all paths traversed during program execution, while the DFG graphically represents the logical functions of the system, the logical flow of data within the system, and the logical transformation process from a data transfer and processing perspective. By parsing each point of the AST, a corresponding control flow target or data flow target may be found, and nodes may be connected to other flow targets at each node, for example IF statements may be connected to the THEN flow target and the ELSE flow target depending on the specifics of the code, thereby forming CFGs and DFGs from these flow targets.
S103: and converting the CFG and the DFG of each JS code into a characteristic vector, and marking the label of the characteristic vector of each JS code according to the type of the JS code corresponding to each characteristic vector.
It should be understood that both the CFG and the DFG are graphical, and for ease of model processing, the CFG and the DFG need to be converted to feature vectors.
In the embodiment of the present application, the CFG and the DFG may be traversed to form flow paths of the CFG and the DFG, and then the flow paths of the CFG and the DFG are subjected to a numerical processing to obtain feature vectors corresponding to the CFG and the DFG.
It should be noted that the flow path described above refers to a node sequence formed by traversing the CFG and the DFG according to a tree structure. In the flow path, the basic syntax structure information, the control flow information, the data flow information, and the like of the parsed JS code are included, for example: brackets, equal signs, semicolons, function names, variable names and the like in the JS codes.
In the embodiment of the present application, after the flow paths of the CFG and the DFG are obtained, a numerical labeling processing technique may be adopted to identify each character or character string in the flow paths by a unique numerical ID, so as to obtain feature vectors corresponding to the CFG and the DFG.
It should be understood that, in the embodiment of the present application, a reference dictionary may be preset, and a unique digital ID identifier corresponding to each character, for example, a corresponds to 1208, b corresponds to 1367, and the like, may be set in the reference dictionary, so that the conversion of the flow path into the feature vector may be implemented by using the reference dictionary.
It should be understood that, in the embodiment of the present application, a feature vector may be obtained by separately converting a CFG and a DFG, and then the two feature vectors are used as the same input to perform the training of the attack detection model. In addition, after the feature vectors of the CFG and the DFG are obtained through conversion, the feature vectors of the CFG and the DFG may be combined into one feature vector (for example, the feature vectors of the CFG and the DFG are directly spliced into one feature vector), and the feature vector may be used for training the attack detection model.
It should be noted that, in the embodiment of the present application, in order to train the attack detection model, it is necessary to explicitly determine the type corresponding to each feature vector obtained by conversion. Therefore, the electronic equipment converts the CFG and the DFG of each JS code into the feature vector, and automatically marks the label of each feature vector as the type of the corresponding JS code according to the type of the JS code corresponding to each feature vector. For example, the label corresponding to the normal JS code is marked as "normal", the label corresponding to the JS code of the "direct bandwidth consumption class" is marked as "direct bandwidth consumption class", and the like.
It should be understood that the labels for the feature vectors may also be represented by numbers such as "0", "1", "2", or characters such as "a", "b", "c", as long as the actual types corresponding to the numbers or characters are associated in advance.
Of course, in the embodiment of the present application, the label of each feature vector may also be marked in a manual marking manner.
S104: and inputting the feature vector of each JS code and the label of the feature vector of each JS code into a preset attack detection model for training to obtain the trained attack detection model.
In the embodiment of the present application, the attack detection model may be implemented by using a deep learning algorithm model, for example, a convolutional neural network model, a cyclic neural network model, a recursive neural network model, and an LSTM (Long short-term memory network) model.
In the embodiment of the application, functions such as classification cross entropy and the like can be used as loss functions of the attack detection model, so that loss values in the training process are calculated.
In addition, in the embodiment of the present application, an algorithm such as a gradient descent algorithm, an adamipizer algorithm, Adadelta, and adarrad may also be used as an optimizer to update and calculate the network parameters of the attack detection model.
In addition, in the embodiment of the present application, indexes such as F1 and ROC (receiver operating characteristic curve) may be used to evaluate the attack detection model trained in each iteration, so as to determine whether the attack detection model may end the training.
It should be understood that the loss function, optimizer, and evaluation index described above are only some of the loss functions, optimizers, and evaluation indexes that can be used as exemplified in the embodiments of the present application, and are not intended to limit the embodiments of the present application.
After the attack detection model is trained, the detection of the JS DDoS attack can be realized by adopting the trained attack detection model. Referring to fig. 2, fig. 2 is a schematic flow chart of an attack detection method provided in an embodiment of the present application, including:
s201: and acquiring the JS code to be detected.
It should be understood that, in the embodiment of the present application, the attack detection model may be set at a location such as a portal server of a Web (Web page) application, so as to implement detection of each JS code received.
S202: and analyzing the abstract syntax tree of the JS code to obtain the CFG and the DFG of the JS code.
In this step, the specific processes of obtaining the CFG and the DFG of the JS code are consistent with the specific processes of obtaining the CFG and the DFG of each JS code in the training process, and therefore, the details are not repeated here.
S203: the CFG and DFG are converted into feature vectors.
In this step, the specific process of converting CFG and DFG into feature vectors is the same as the specific process of converting CFG and DFG into feature vectors in the training process, and therefore, the detailed description thereof is omitted here.
S204: and inputting the characteristic vector into a pre-trained attack detection model to obtain the type of the JS code.
It should be noted that, if the feature vector of the CFG and the feature vector of the DFG are input as a pair to train the attack detection model in the training process, the feature vector of the CFG and the feature vector of the DFG of the JS code to be detected are input as a pair at this time and are input into the attack detection model trained in advance. If the feature vector obtained by combining the feature vector of the CFG and the feature vector of the DFG is used as input to train the attack detection model in the training process, at this time, the feature vector of the CFG and the feature vector of the DFG of the JS code to be detected need to be combined to obtain a feature vector according to the same combination mode, and then the feature vector obtained by combination is input into the attack detection model trained in advance.
In the embodiment of the application, the attack detection model outputs the type of the JS code to be detected.
In the embodiment of the application, the processing strategies corresponding to each type of JS code can be configured in advance, and after the type of the JS code to be detected is output, the JS code is processed according to the processing strategy corresponding to the type.
The following are exemplary: when the type of the JS code is 'normal', the JS code can be normally executed; when the type of the JS code is a direct bandwidth consumption type, the HTTP GET/POST request initiated by the attack source IP can be discarded; when the type of the JS code is in a resource forgery consumption type, forged HTTP Header information can be filtered; when the type of the JS code is 'service performance consumption type', an attack source IP can be blocked; when the type of the JS code is 'protocol stack performance consumption type', WebSocket connection initiated by an attack source IP can be blocked; when the type of the JS code is a 'control resource consumption class', the control resource can be cached or the control resource of the IP request of the attack source can be blocked; the DNS can be cached when the type of the JS code is "infrastructure performance consuming class".
In this embodiment of the application, when it is detected that the JS code is of a certain type other than the "normal" type, relevant information of the JS code may also be obtained, for example, an originating source IP (Internet Protocol Address) Address and a source port of the JS code, a URL (Uniform Resource Locator) Address of a request, a request method, request time, Refer information, a browser user agent field, and other information.
The acquired information related to the JS code can be stored and visualized, so that thread support can be conveniently provided for JS DDoS attack tracing.
In addition, in the embodiment of the application, the configuration management of the front-end user, the alarm of the JS DDoS attack and the like can be visualized, so that the management and the source tracing are facilitated.
According to the attack detection method and the attack detection model training method, the CFG and the DFG of the JS codes are obtained by analyzing the abstract syntax tree of the JS codes, and then the CFG and the DFG are used as the sum core characteristics of the JS codes, so that classification and identification of the JS codes are achieved. Because different control flow and/or data flow characteristics are provided between JS codes of JS DDoS attacks of different types, and the JS codes used for JS DDoS attacks and normal JS codes are provided with different control flow and/or data flow characteristics, the embodiment of the application utilizes CFG and DFG of the JS codes to detect, can effectively realize distinguishing and identifying normal JS codes and JS DDoS attacks of various types, and further can realize reliable detection on JS DDoS attacks. In addition, according to the attack detection method, an extra JS hook program does not need to be implanted into the Web application, only the attack detection model needs to be trained on the server, the method is suitable for various Web applications, the scheme application scene is wide, and the extra JS hook program does not need to be implanted into the Web application, so that extra flow consumption is not increased, and unsafe factors are not introduced.
Example two:
in this embodiment, on the basis of the first embodiment, a case where an attack detection model is an LSTM model and a scheme is implemented on a software system is taken as an example, which is further illustrated in this application.
Referring to fig. 3, the software system may be divided into a sample data generation module, a sample data preprocessing module, a model construction module, an identification blocking module, and a visual display module. Wherein:
the sample data generation module mainly comprises two parts of JS code sample classification of the original JS DDoS attack and JS code sample data generation of a plaintext or confused JS DDoS attack.
Referring to table one, in this embodiment, JS codes causing JS DDoS attack are divided into six types, namely, direct bandwidth consumption, resource forgery consumption, service performance consumption, protocol stack performance consumption, control resource consumption, and infrastructure performance consumption. These JS codes of the six types, together with the normal JS code, constitute the JS code sample of the seven type of the present embodiment.
According to different JS code types, a large number of plaintext or confused JS codes of JS DDoS attack are respectively generated, and the grammar structure or variable definition of the generated JS codes is not completely the same as the JS codes of the original JS DDoS attack, but the function logic of the codes is consistent with the JS codes of the original JS DDoS attack. And meanwhile, selecting a large number of normal non-offensive JS codes from the open source website. And putting the generated large amount of JS code data and the normal non-offensive JS code together for feature extraction and model training of the code.
And the sample data preprocessing module establishes an abstract syntax tree for the generated JS codes so as to analyze the structure of the JS codes.
And after the abstract syntax tree is established, analyzing the abstract syntax tree to realize the construction of the CFG and the DFG.
For the CFG and DFG of each JS code, a node sequence (i.e., a flow path) is formed by a tree structure search algorithm traversed according to depth-first, and the components of the node sequence include basic syntax structure information, control flow information, and data flow information of a load, for example: brackets, equal signs, semicolons, function names, variable names and the like, and the serialized information is subjected to numerical labeling processing to form a feature vector which is input into the LSTM model.
After the sample data preprocessing module preprocesses the sample, the model construction module trains the LSTM model by adopting each feature vector obtained by preprocessing and the type of the JS code corresponding to each feature vector.
In this embodiment, the tensor dimension of the LSTM model input may be [ batch _ size, sequence _ length, input _ dimension ], and the tensor dimension of the output may be [ batch _ size, output _ dimension ].
In this embodiment, the input cells are randomly set to 0 by a ratio at each update in the training to help prevent overfitting.
In this embodiment, the LSTM model may include 100 memory units, the output layer may include a fully connected layer of 7 classes, the activation function may be set to softmax, the loss function may be class Cross Entropy coding entry, and adammoptizer may be selected as the optimizer, aiming to minimize the loss value of the LSTM model. In this embodiment, indexes such as F1 and ROC can be used to evaluate the quality of the trained LSTM model.
After the training is finished, the identification blocking module can adopt the trained LSTM model to identify the JS DDoS attack. The LSTM model inputs JS codes of unknown types, and outputs feature vectors of CFG and DFG after AST analysis and numerical processing, namely the output JS codes are one of the seven classes of direct bandwidth consumption class, resource forgery consumption class, service performance consumption class, protocol stack performance consumption class, control resource consumption class, infrastructure performance consumption class and normal codes.
If the JS codes belong to the normal class, the identification blocking module does not need to do any processing; and if the JS codes are detected to belong to one of other six types, triggering JS DDoS attack early warning, and performing classified treatment blocking according to specific attack types, for example, issuing load code characteristic rules to block in a firewall and the like.
In this embodiment, the visualization display module can perform visualization display on the JS DDoS attack detected by the LSTM model by constructing a monitoring platform of the JS DDoS attack, and includes visualization of configuration management of a front-end user, and warning of the attack, etc., a source IP and a source port that initiate a request for the JS DDoS attack, visualization of a URL address of the request, a request method, request time, Refer information, a browser user agent field, etc., and can provide a thread support for the JS DDoS attack tracing.
It should be understood that, in the embodiment of the present application, the names of the modules may be set according to actual needs, as long as the functions of the modules can be implemented.
According to the scheme, the CFG and the DFG of the JS codes are used for detection, distinguishing and identifying of normal JS codes and JS DDoS attacks of various types can be effectively achieved, and therefore reliable detection of the JS DDoS attacks can be achieved. In addition, according to the attack detection method, an extra JS hook program does not need to be implanted into the Web application, only the attack detection model needs to be trained on the server, the method is suitable for various Web applications, the scheme application scene is wide, and the extra JS hook program does not need to be implanted into the Web application, so that extra flow consumption is not increased, and unsafe factors are not introduced.
Example three:
based on the same inventive concept, the embodiment of the present application further provides an attack detection apparatus 100 and a model training apparatus 200. Referring to fig. 4 and 5, fig. 4 shows an attack detection apparatus using the method shown in fig. 2, and fig. 5 shows a model training apparatus using the method shown in fig. 1. It should be understood that the specific functions of the apparatus 100 and the apparatus 200 can be referred to the above description, and the detailed description is omitted here as appropriate to avoid redundancy. The devices 100 and 200 include at least one software functional module that can be stored in memory in the form of software or firmware or solidified in the operating system of the devices 100 and 200. Specifically, the method comprises the following steps:
referring to fig. 4, the apparatus 100 includes: a first obtaining module 101, a first analyzing module 102, a first converting module 103 and a detecting module 104. Wherein:
the first obtaining module 101 is configured to obtain a JS code to be detected;
the first parsing module 102 is configured to parse the abstract syntax tree of the JS code to obtain a control flow graph CFG and a data flow graph DFG of the JS code;
the first conversion module 103 is configured to convert the CFG and the DFG into feature vectors;
the detection module 104 is configured to input the feature vector into a pre-trained attack detection model to obtain the type of the JS code; the type of the JS code comprises: the corresponding type of JS DDoS attack and the corresponding type of normal JS code.
In this embodiment of the application, the first conversion module 103 is specifically configured to traverse the CFG and the DFG, form a flow path of the CFG and the DFG, and perform a numerical processing on the flow path of the CFG and the DFG to obtain a feature vector corresponding to the CFG and the DFG.
In this embodiment of the application, the device 100 further includes a processing module, configured to obtain after the type of the JS code, according to a preset processing policy corresponding to the type, it is right that the JS code is processed.
Referring to fig. 5, the apparatus 200 includes: a second obtaining module 201, a second parsing module 202, a second converting module 203, and a training module 204. Wherein:
the second obtaining module 201 is configured to obtain JS codes of various JS DDoS attacks and normal JS codes;
the second parsing module 202 is configured to parse the abstract syntax tree of each JS code to obtain a control flow graph CFG and a data flow graph DFG of each JS code;
the second conversion module 203 is configured to convert the CFG and the DFG of each JS code into a feature vector, and mark a label of the feature vector of each JS code according to a type of the JS code corresponding to each feature vector;
the training module 204 is configured to input each of the feature vectors of the JS codes and each of the tags of the feature vectors of the JS codes into a preset attack detection model for training, so as to obtain a trained attack detection model.
In the embodiment of the application, the attack detection model is a deep learning algorithm model.
In the embodiment of the application, the second obtaining module 201 is specifically configured to obtain the currently existing first JS codes of various JS DDoS attacks, and generate second JS codes of various JS DDoS attacks on the basis of the first JS codes of various JS DDoS attacks according to a preset code generation rule; the second JS code includes a plain JS code and/or an obfuscated JS code.
It should be understood that, for the sake of brevity, the contents described in some embodiments are not repeated in this embodiment.
It is also understood that the functions of the apparatus 100 and the apparatus 200 may be implemented using one or more processing chips.
Example four:
the present embodiment provides a server, as shown in fig. 6, which includes a processor 601, a memory 602, and a communication bus 603. Wherein:
the communication bus 603 is used for connection communication between the processor 601 and the memory 602.
The processor 601 is configured to execute one or more programs stored in the memory 602 to implement the attack detection method and/or the attack detection model training method in the first embodiment.
It will be appreciated that the configuration shown in fig. 6 is merely illustrative and that the server may include more or fewer components than shown in fig. 6 or have a different configuration than shown in fig. 6.
The present embodiment further provides a readable storage medium, such as a floppy disk, an optical disk, a hard disk, a flash Memory, a usb (Secure Digital Memory Card), an MMC (Multimedia Card), etc., where one or more programs for implementing the above steps are stored in the readable storage medium, and the one or more programs may be executed by one or more processors to implement the attack detection method and/or the attack detection model training method in the first embodiment. And will not be described in detail herein.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
In this context, a plurality means two or more.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (9)

1. An attack detection method, comprising:
acquiring a JS code to be detected;
analyzing the abstract syntax tree of the JS code to obtain a control flow graph CFG and a data flow graph DFG of the JS code;
converting the CFG and the DFG into feature vectors;
inputting the feature vector into a pre-trained attack detection model to obtain the type of the JS code; the type of the JS code comprises: the type corresponding to JS DDoS attack and the type corresponding to normal JS code;
converting the CFG and the DFG into feature vectors, including:
traversing the CFG and the DFG to form flow paths of the CFG and the DFG; the flow paths of the CFG and the DFG are node sequences formed by traversing the CFG and the DFG according to a tree structure respectively;
and searching the digital ID of each character or character string in the flow path of the CFG and the DFG from a preset reference dictionary to obtain the characteristic vectors corresponding to the CFG and the DFG.
2. The attack detection method of claim 1 wherein, after obtaining the type of the JS code, the method further comprises:
and processing the JS code according to a preset processing strategy corresponding to the type.
3. A method of model training, comprising:
acquiring JS codes of various JS DDoS attacks and normal JS codes;
analyzing the abstract syntax tree of each JS code to obtain a control flow graph CFG and a data flow graph DFG of each JS code;
converting the CFG and the DFG of each JS code into a feature vector, and marking a label of the feature vector of each JS code according to the type of the JS code corresponding to each feature vector;
inputting the feature vector of each JS code and the label of the feature vector of each JS code into a preset attack detection model for training to obtain a trained attack detection model;
converting the CFG and the DFG of each of the JS codes into a feature vector, including:
traversing the CFG and the DFG of each JS code to form a flow path of the CFG and the DFG; the flow paths of the CFG and the DFG are node sequences formed by traversing the CFG and the DFG according to a tree structure respectively;
and searching the digital ID of each character or character string in the flow path of the CFG and the DFG from a preset reference dictionary to obtain the characteristic vectors corresponding to the CFG and the DFG.
4. The model training method of claim 3, wherein the attack detection model is a deep learning algorithm model.
5. The model training method of claim 3 or 4, wherein the obtaining of the JS codes of various JS DDoS attacks comprises:
acquiring a current existing first JS code of various JS DDoS attacks;
according to a preset code generation rule, generating second JS codes of various JS DDoS attacks on the basis of the first JS codes of various JS DDoS attacks; the second JS code includes a plain JS code and/or an obfuscated JS code.
6. An attack detection apparatus, comprising: the device comprises a first acquisition module, a first analysis module, a first conversion module and a detection module;
the first acquisition module is used for acquiring the JS code to be detected;
the first analysis module is used for analyzing the abstract syntax tree of the JS code to obtain a control flow graph CFG and a data flow graph DFG of the JS code;
the first conversion module is configured to convert the CFG and the DFG into feature vectors;
the detection module is used for inputting the feature vector into a pre-trained attack detection model to obtain the type of the JS code; the type of the JS code comprises: the type corresponding to JS DDoS attack and the type corresponding to normal JS code;
the first conversion module is specifically configured to traverse the CFG and the DFG to form a flow path of the CFG and the DFG, and look up a numeric ID of each character or character string in the flow path of the CFG and the DFG from a preset reference dictionary to obtain a feature vector corresponding to the CFG and the DFG; and the flow paths of the CFG and the DFG are node sequences formed by traversing the CFG and the DFG according to a tree structure respectively.
7. A model training apparatus, comprising: the system comprises a second acquisition module, a second analysis module, a second conversion module and a training module;
the second acquisition module is used for acquiring JS codes of various JS DDoS attacks and normal JS codes;
the second analysis module is used for analyzing the abstract syntax tree of each JS code to obtain a control flow graph CFG and a data flow graph DFG of each JS code;
the second conversion module is used for converting the CFG and the DFG of each JS code into a feature vector, and marking a label of the feature vector of each JS code according to the type of the JS code corresponding to each feature vector;
the training module is used for inputting the feature vector of each JS code and the label of the feature vector of each JS code into a preset attack detection model for training to obtain a trained attack detection model;
the second conversion module is specifically configured to traverse the CFG and the DFG of each JS code, form a flow path of the CFG and the DFG, and search for a numeric ID of each character or character string in the flow path of the CFG and the DFG from a preset reference dictionary to obtain a feature vector corresponding to the CFG and the DFG; and the flow paths of the CFG and the DFG are node sequences formed by traversing the CFG and the DFG according to a tree structure respectively.
8. A server, comprising: a processor, a memory, and a communication bus;
the communication bus is used for realizing connection communication between the processor and the memory;
the processor is configured to execute one or more programs stored in the memory to implement the attack detection method of claim 1 or 2, or to implement the model training method of any one of claims 3 to 5.
9. A readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the attack detection method according to any one of claims 1 or 2 or to implement the model training method according to any one of claims 3 to 5.
CN202110408265.5A 2021-04-15 2021-04-15 Attack detection method, model training method, device, server and storage medium Active CN112989348B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110408265.5A CN112989348B (en) 2021-04-15 2021-04-15 Attack detection method, model training method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110408265.5A CN112989348B (en) 2021-04-15 2021-04-15 Attack detection method, model training method, device, server and storage medium

Publications (2)

Publication Number Publication Date
CN112989348A CN112989348A (en) 2021-06-18
CN112989348B true CN112989348B (en) 2021-08-17

Family

ID=76340672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110408265.5A Active CN112989348B (en) 2021-04-15 2021-04-15 Attack detection method, model training method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN112989348B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569992B (en) * 2021-08-26 2024-01-09 中国电子信息产业集团有限公司第六研究所 Abnormal data identification method and device, electronic equipment and storage medium
CN113821448A (en) * 2021-11-22 2021-12-21 上海斗象信息科技有限公司 Webshell code detection method and device and readable storage medium
CN115600216B (en) * 2022-11-04 2024-03-22 中国电信股份有限公司 Detection method, detection device, detection equipment and storage medium
CN116302043B (en) * 2023-05-25 2023-10-10 深圳市明源云科技有限公司 Code maintenance problem detection method and device, electronic equipment and readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611586B (en) * 2019-02-25 2023-03-31 上海信息安全工程技术研究中心 Software vulnerability detection method and device based on graph convolution network
CN111694570A (en) * 2019-03-13 2020-09-22 南京大学 JavaScript function parameter mismatching detection method based on static program analysis
CN111090860A (en) * 2019-12-10 2020-05-01 北京邮电大学 Code vulnerability detection method and device based on deep learning
CN112003834B (en) * 2020-07-30 2022-09-23 瑞数信息技术(上海)有限公司 Abnormal behavior detection method and device
CN112416787A (en) * 2020-11-27 2021-02-26 平安普惠企业管理有限公司 JAVA-based project source code scanning analysis method, system and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的JavaScript恶意代码检测技术研究与实现;杨宇行;《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》;20190815;第四、六章 *

Also Published As

Publication number Publication date
CN112989348A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN112989348B (en) Attack detection method, model training method, device, server and storage medium
CN103888490B (en) A kind of man-machine knowledge method for distinguishing of full automatic WEB client side
CN103559235B (en) A kind of online social networks malicious web pages detection recognition methods
US11848913B2 (en) Pattern-based malicious URL detection
CN111585955B (en) HTTP request abnormity detection method and system
CN112866023B (en) Network detection method, model training method, device, equipment and storage medium
Cui et al. Malicious URL detection with feature extraction based on machine learning
KR102452123B1 (en) Apparatus for Building Big-data on unstructured Cyber Threat Information, Method for Building and Analyzing Cyber Threat Information
CN107341399B (en) Method and device for evaluating security of code file
CN111753171B (en) Malicious website identification method and device
CN109194677A (en) A kind of SQL injection attack detection, device and equipment
CN111104579A (en) Identification method and device for public network assets and storage medium
CN110581864B (en) Method and device for detecting SQL injection attack
KR20210084204A (en) Malware Crawling Method and System
CN111460803B (en) Equipment identification method based on Web management page of industrial Internet of things equipment
CN109194605B (en) Active verification method and system for suspicious threat indexes based on open source information
Yan et al. Cross-site scripting attack detection based on a modified convolution neural network
CN115392238A (en) Equipment identification method, device, equipment and readable storage medium
CN113992390A (en) Phishing website detection method and device and storage medium
CN116414976A (en) Document detection method and device and electronic equipment
CN107239704A (en) Malicious web pages find method and device
CN114064905A (en) Network attack detection method, device, terminal equipment, chip and storage medium
CN113688346A (en) Illegal website identification method, device, equipment and storage medium
Sakai et al. An Automatic Detection System for Fake Japanese Shopping Sites Using fastText and LightGBM
CN116775889B (en) Threat information automatic extraction method, system, equipment and storage medium based on natural language processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant