CN117728995A - XSS attack detection method and device, computer equipment and storage medium - Google Patents

XSS attack detection method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN117728995A
CN117728995A CN202311660661.2A CN202311660661A CN117728995A CN 117728995 A CN117728995 A CN 117728995A CN 202311660661 A CN202311660661 A CN 202311660661A CN 117728995 A CN117728995 A CN 117728995A
Authority
CN
China
Prior art keywords
detected
model
code segment
path
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311660661.2A
Other languages
Chinese (zh)
Inventor
盛毓桐
童则余
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Safety Technology Co Ltd
Original Assignee
Tianyi Safety Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Safety Technology Co Ltd filed Critical Tianyi Safety Technology Co Ltd
Priority to CN202311660661.2A priority Critical patent/CN117728995A/en
Publication of CN117728995A publication Critical patent/CN117728995A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an XSS attack detection method, an XSS attack detection device, computer equipment and a storage medium, wherein the method comprises the following steps: determining a code segment to be detected; inputting the code segments to be detected into a first model to obtain context information feature vectors corresponding to the code segments to be detected; the first model is used for extracting characteristics of the triplet context information corresponding to the code segment to be detected; the triplet context information is obtained based on path recognition of an abstract syntax tree constructed by the code fragments to be detected; inputting the code segments to be detected into a second model to obtain the stain path information feature vectors corresponding to the code segments to be detected; the second model is used for extracting characteristics of a sensitive code path which corresponds to the code segment to be detected and is determined based on the pollution point source; and inputting the context information feature vector and the stain path information feature vector into a third model to obtain a detection result of the code segment to be detected. Various features of the code fragments are analyzed to improve the detection results.

Description

XSS attack detection method and device, computer equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of network security, in particular to an XSS attack detection method, an XSS attack detection device, computer equipment and a storage medium.
Background
At present, with the continuous development of technology, the data transmission capability is greatly improved, so that the World Wide Web (Web) technology is more widely applied. It is well known that attacks on Web applications are the most important part of a cyber threat, and pose serious problems to many businesses, government agencies, etc., such as cross site scripting (XSS) attacks.
In the related art, XSS attack is generally detected by means of regular matching. However, such a manner does not take into account the relevant information of the code fragments in the script, resulting in a lower accuracy of detection of XSS attacks.
Disclosure of Invention
The embodiment of the invention provides an XSS attack detection method, an XSS attack detection device, computer equipment and a storage medium, which are used for improving the accuracy of XSS attack detection.
In a first aspect, an XSS attack detection method is provided, the method comprising:
determining a code segment to be detected;
inputting the code segment to be detected into a first model to obtain a context information feature vector corresponding to the code segment to be detected; the first model is used for extracting characteristics of the triplet context information corresponding to the code segment to be detected; the triplet context information is obtained based on path recognition of an abstract syntax tree constructed by the code segments to be detected;
Inputting the code segment to be detected into a second model to obtain a stain path information feature vector corresponding to the code segment to be detected; the second model is used for extracting characteristics of a sensitive code path which corresponds to the code segment to be detected and is determined based on a pollution point source;
inputting the context information feature vector and the stain path information feature vector into a third model to obtain a detection result of the code segment to be detected; and the third model is used for carrying out XSS attack recognition on the context information feature vector and the taint path information feature vector.
In one possible implementation manner, inputting the code segment to be detected into a first model to obtain a context information feature vector corresponding to the code segment to be detected, including:
constructing an abstract syntax tree based on the code segment to be detected, inputting the abstract syntax tree into a sub-detection model in the first model, and obtaining triple context information comprising an initial leaf node, a target leaf node and a path node connecting the initial leaf node and the target leaf node;
and constructing a subgraph based on the triplet context information, and determining a context information feature vector based on the subgraph.
In one possible implementation manner, inputting the code segment to be detected into a second model to obtain a stain path information feature vector corresponding to the code segment to be detected, including:
constructing an abstract syntax tree, a control flow graph and a call graph based on the code segments to be detected;
performing path searching on the abstract syntax tree, the control flow graph and the call graph based on a preset path analysis algorithm to obtain all paths corresponding to the code segments to be detected; the path is used for representing the execution sequence of a group of codes;
determining whether a pollution point source exists in all paths based on a preset static stain analysis algorithm, and obtaining paths with the pollution point sources;
and analyzing and processing the path with the dirty point source to obtain a dirty path information feature vector corresponding to the code segment to be detected.
In one possible implementation manner, the inputting the context information feature vector and the stain path information feature vector into a third model to obtain a detection result of the code segment to be detected includes:
based on a multi-layer perception mechanism in the third model, perceiving context semantic information and code grammar information in the context information feature vector and the taint path information feature vector;
And carrying out XSS attack recognition based on the context semantic information and the code grammar information to obtain a detection result of the code segment to be detected.
In one possible embodiment, the method further comprises:
and when the detection result of the code segment to be detected is that XSS attack exists, generating an alarm event, and sending the alarm event to an object associated with the code segment to be detected.
In a second aspect, an embodiment of the present invention provides an XSS attack detection apparatus, where the apparatus includes:
a determining unit for determining a code segment to be detected;
the first obtaining unit is used for inputting the code segment to be detected into a first model to obtain a context information feature vector corresponding to the code segment to be detected; the first model is used for extracting characteristics of the triplet context information corresponding to the code segment to be detected; the triplet context information is obtained based on path recognition of an abstract syntax tree constructed by the code segments to be detected;
the second obtaining unit is used for inputting the code segment to be detected into a second model to obtain a stain path information feature vector corresponding to the code segment to be detected; the second model is used for extracting characteristics of a sensitive code path which corresponds to the code segment to be detected and is determined based on a pollution point source;
The processing unit is used for inputting the context information feature vector and the stain path information feature vector into a third model to obtain a detection result of the code segment to be detected; and the third model is used for carrying out XSS attack recognition on the context information feature vector and the taint path information feature vector.
In a possible embodiment, the first obtaining unit is specifically configured to:
constructing an abstract syntax tree based on the code segment to be detected, inputting the abstract syntax tree into a sub-detection model in the first model, and obtaining triple context information comprising an initial leaf node, a target leaf node and a path node connecting the initial leaf node and the target leaf node;
and constructing a subgraph based on the triplet context information, and determining a context information feature vector based on the subgraph.
In a possible embodiment, the second obtaining unit is specifically configured to:
constructing an abstract syntax tree, a control flow graph and a call graph based on the code segments to be detected;
performing path searching on the abstract syntax tree, the control flow graph and the call graph based on a preset path analysis algorithm to obtain all paths corresponding to the code segments to be detected; the path is used for representing the execution sequence of a group of codes;
Determining whether a pollution point source exists in all paths based on a preset static stain analysis algorithm, and obtaining paths with the pollution point sources;
and analyzing and processing the path with the dirty point source to obtain a dirty path information feature vector corresponding to the code segment to be detected.
In a possible embodiment, the processing unit is specifically configured to:
based on a multi-layer perception mechanism in the third model, perceiving context semantic information and code grammar information in the context information feature vector and the taint path information feature vector;
and carrying out XSS attack recognition based on the context semantic information and the code grammar information to obtain a detection result of the code segment to be detected.
In a possible implementation manner, the device further comprises a prompting unit, configured to:
and when the detection result of the code segment to be detected is that XSS attack exists, generating an alarm event, and sending the alarm event to an object associated with the code segment to be detected.
In a third aspect, there is provided a computer device comprising:
a memory for storing program instructions;
and a processor for calling program instructions stored in the memory, and executing steps comprised in any one of the methods of the first aspect according to the obtained program instructions.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program for execution by a processor to perform the steps comprised by any of the methods as in the first aspect.
In a fifth aspect, there is provided a computer program product enabling a computer device to carry out the steps comprised by any of the methods of the first aspect when said computer program product is run on the computer device.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects:
in the embodiment of the invention, the code segment to be detected can be determined first; inputting the code segments to be detected into a first model to obtain context information feature vectors corresponding to the code segments to be detected; the first model is used for extracting characteristics of the triplet context information corresponding to the code segment to be detected; the triplet context information is obtained based on path recognition of an abstract syntax tree constructed by the code fragments to be detected; inputting the code segment to be detected into a second model to obtain a stain path information feature vector corresponding to the code segment to be detected; the second model is used for extracting characteristics of a sensitive code path which corresponds to the code segment to be detected and is determined based on the pollution point source; finally, inputting the context information feature vector and the stain path information feature vector into a third model to obtain a detection result of the code segment to be detected; and the third model is used for carrying out XSS attack recognition on the context information feature vector and the taint path information feature vector.
Therefore, in the embodiment of the invention, the intrinsic structure, the semantics and the data flow information of the code to be detected can be deeply learned and understood, so that more comprehensive and deeper context information feature vectors and stain path information feature vectors are obtained, and the accuracy and the robustness of the detection of complex and secret XSS attacks are greatly improved. Meanwhile, the third model can automatically detect, so that the safety of the code fragments and the detection efficiency of XSS attack on the code fragments are effectively improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention and do not constitute a undue limitation on the invention.
Fig. 1 is a schematic diagram of an application scenario in an embodiment of the present invention;
FIG. 2 is a flowchart of an XSS attack detection method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a first model obtained in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a second model obtained in an embodiment of the present invention;
FIG. 5 is a block diagram illustrating an XSS attack detection apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. Embodiments of the invention and features of the embodiments may be combined with one another arbitrarily without conflict. Also, while a logical order is depicted in the flowchart, in some cases, the steps depicted or described may be performed in a different order than presented herein.
The terms first and second in the description and claims of the invention and in the above-mentioned figures are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
In order to facilitate a better understanding of the technical solutions of the present invention, the following description of the terms related to the present invention will be presented to those skilled in the art.
1. Cross-site scripting (Cross Site Scripting, XSS) attack: is a common cyber security threat, and without proper data verification and filtering, an attacker can "inject" malicious script code into a web application that is executed on their browser or mobile device when other users browse the "injected" pages of malicious script.
2. Programming Language Processing (PLP): programming language processing is a series of computational methods that include the steps of parsing code, extracting useful information (e.g., abstract syntax trees), and understanding code semantics. Programming language processing is the basis for many fields of computer science, such as programming language design, construction of compilers and interpreters, optimization of programs, formal methods and verification, program testing and debugging, and automation of program understanding and maintenance.
3. Abstract syntax tree (Abstract Syntax Tree, AST): in an embodiment of the present invention, the abstract syntax tree is a tree representation of source code that reflects the syntax structure of the programming language. Each node in an AST represents a certain structure in the source code, such as a function, a loop, a conditional statement, etc., with which the structure of the code can be intuitively examined and analyzed.
4. Graph roll network (Graph Convolution Network, GCN): is a graphic neural network, and is mainly used for processing graphic data. It captures the relationships between nodes (e.g., AST nodes in embodiments of the invention) by performing convolution operations on the graph data, thereby mining deeper features.
5. Stain Analysis (point Analysis): is a technique for dynamically or statically analyzing a program to track the flow of data through the program. The purpose is to find possible data leaks, such as illegal transmission of sensitive information or potential vulnerabilities in the code.
Transducer: is a deep learning model that is particularly prominent in the field of natural language processing because its self-attention (self-attention) structure enables it to capture remote dependencies in input data (e.g., source code in embodiments of the invention).
7. Multilayer perceptron (MLP): is a basic deep learning model composed of neural networks, typically used to deal with classification and regression problems. In the embodiment of the invention, the MLP is used for analyzing whether XSS attacks exist in codes according to the extracted features.
As described above, in the related art, XSS attack is generally detected by means of regular matching. However, such a manner does not take into account the relevant information of the code fragments in the script, resulting in a lower accuracy of detection of XSS attacks.
In view of this, the embodiment of the invention provides an XSS attack detection method, which can determine a code segment to be detected first; inputting the code segments to be detected into a first model to obtain context information feature vectors corresponding to the code segments to be detected; the first model is used for extracting characteristics of the triplet context information corresponding to the code segment to be detected; the triplet context information is obtained based on path recognition of an abstract syntax tree constructed by the code fragments to be detected; inputting the code segment to be detected into a second model to obtain a stain path information feature vector corresponding to the code segment to be detected; the second model is used for extracting characteristics of a sensitive code path which corresponds to the code segment to be detected and is determined based on the pollution point source; finally, inputting the context information feature vector and the stain path information feature vector into a third model to obtain a detection result of the code segment to be detected; and the third model is used for carrying out XSS attack recognition on the context information feature vector and the taint path information feature vector.
Therefore, in the embodiment of the invention, the intrinsic structure, the semantics and the data flow information of the code to be detected can be deeply learned and understood, so that more comprehensive and deeper context information feature vectors and stain path information feature vectors are obtained, and the accuracy and the robustness of the detection of complex and secret XSS attacks are greatly improved. Meanwhile, the third model can automatically detect, so that the safety of the code fragments and the detection efficiency of XSS attack on the code fragments are effectively improved.
After the design concept of the embodiment of the present invention is introduced, some simple descriptions are provided for application scenarios suitable for the technical solution in the embodiment of the present invention, and it should be noted that, the application scenarios described in the embodiment of the present invention are for more clearly describing the technical solution of the embodiment of the present invention, and do not constitute a limitation on the technical solution provided by the embodiment of the present invention, and as a new application scenario appears, those skilled in the art can know that the technical solution provided by the embodiment of the present invention is equally suitable for similar technical problems.
In the embodiment of the invention, the XSS attack detection method provided by the embodiment of the invention can be applied to monitoring on-line code behaviors. Specifically, the XSS attack detection method is accessed into a corresponding code base. Every time a person submits a new code, the computer equipment can detect the new code, when the potential XSS attack risk is detected, an alarm event can be generated and sent to corresponding responsible personnel, so that the potential code defect can be found, and the safety of software and programs is improved.
Referring to fig. 1, a schematic view of a scenario in which an embodiment of the invention is applicable includes an information collecting device 101, a computer device 102 and an associated electronic device 103, and the XSS attack detection method in the embodiment of the invention may be implemented by cooperation of the information collecting device 101 and the computer device 102 in fig. 1. And, the computer device 102 may send the XSS attack detection result to the associated electronic device 103, so that the corresponding person processes the code segment in which the XSS attack exists.
In a specific implementation, the information collecting device 101 may obtain the code segment and/or the source code. After the information collection device 101 collects the code segments and/or source code, the code segments and/or source code may be transmitted to the computer device 102 via the network 104.
The computer device 102 may include, among other things, one or more processors 1021, memory 1022, an I/O interface 1023 that interacts with the information acquisition device 101, and an I/O interface 1024 that interacts with an associated electronic device 103. In a specific implementation process, a plurality of computer devices 102 may interact with a plurality of information collecting devices 101, or one computer device 102 may interact with one information collecting device 101, which is not limited in the embodiment of the present invention. Specifically, the computer device 102 may also be connected to an associated electronic device 103, and feedback information about XSS attack to the associated electronic device 103, where the associated electronic device 103 is, for example, a device used by a manager associated with the source code, and fig. 1 illustrates an interaction between one computer device 102 and one information collecting device 101 and one associated electronic device 103.
In an embodiment of the present invention, the computer device 102 may receive the code segments and/or source codes transmitted by the information collecting device 101 through the I/O interface 1023, then process the code segments and/or source codes with the processor 1021, and store the processed information in the memory 1022. Of course, the computer device can send alert information to the associated electronic device 103 via the interface 1024, which alert information can be used to alert code segments and/or source code to the existence of XSS attacks.
The information gathering device 101 and the computer device 102 may be communicatively coupled via one or more networks 104. The associated electronic device 103 and the computer device 102 may also be communicatively coupled via one or more networks 104. The network 104 may be a wired network, or may be a WIreless network, for example, a mobile cellular network, or may be a WIreless-Fidelity (WIFI) network, or may be other possible networks, which the embodiments of the present invention are not limited to.
In order to further explain the scheme of the XSS attack detection method provided by the embodiment of the present invention, the following details are described with reference to the accompanying drawings and the specific embodiments. Although embodiments of the present invention provide the method operational steps shown in the following embodiments or figures, more or fewer operational steps may be included in the method, either on a routine or non-inventive basis. In steps where there is logically no necessary causal relationship, the execution order of the steps is not limited to the execution order provided by the embodiments of the present invention. The methods may be performed sequentially or in parallel (e.g., parallel processor or multi-threaded processing application environments) as shown in the embodiments or figures when the methods are performed in the actual process or apparatus.
The XSS attack detection method according to the embodiment of the present invention is described below with reference to the flowchart of the method shown in fig. 2, where each step shown in fig. 2 may be executed by a computer device as shown in fig. 1. In an implementation, the computer device may be a server, such as a personal computer, a midrange computer, a cluster of computers, and so forth.
Step 201: and determining the code segment to be detected.
In an embodiment of the present invention, the computer device may first determine the code segment to be detected. The code segment to be detected is, for example, a source code sent by the information acquisition device, or a screened one of a plurality of source codes sent by the information acquisition device, or may be a code segment sent by the information acquisition device, which is not limited in the embodiment of the present invention.
In the embodiment of the invention, after the computer equipment determines the code segment to be detected, the first model, the second model and the third model can be adopted to process the code segment to be detected, so that the detection result of the code segment to be detected is obtained.
Before describing the processing of the code segment to be detected by the first model, the second model and the third model, the following processes of obtaining the first model, the second model and the third model are described.
For example, referring to fig. 3, fig. 3 is a schematic diagram illustrating a process of obtaining a first model according to an embodiment of the invention. In an embodiment of the present invention, the data set may be determined first. Specifically, the computer device may obtain the set of code fragments including the latest XSS attack first, and then perform data processing on the set of code fragments including the latest XSS attack to obtain the data set.
Optionally, the data processing is a data cleansing operation and a normalization operation for the collection line including the code fragments of the latest XSS attack. The data cleansing operation can be understood as removing unnecessary labels, notes, invalid characters or special symbols, such as blank spaces, tab symbols, line feed symbols and the like, by using a regular matching mode, so that the consistency of data can be improved, and redundant useless information can be reduced. Normalization operation can be understood as abstracting variable names and function names with descriptive property in code fragments, and uniformly named as universal symbols or identifiers. For example: the variable of the character string type is named str_1 and str_2; the functions are named func_1, func_2, etc., so that noise interference at the time of feature extraction can be reduced.
In the embodiment of the invention, after the data set is obtained, each Code segment in the data set can be parsed into a corresponding abstract syntax tree structure, then all abstract syntax tree structures obtained by parsing are used as input, and the corresponding path context information is extracted from the abstract syntax tree structure in a Programming Language Processing (PLP) mode by using Code2 Vec. Where a piece of path context information should be a triplet, for example (source, path, target), i.e.: an initial leaf node, a destination leaf node, and a set of path nodes joining them.
In the embodiment of the invention, a source code segment (source) in each triplet can be taken as a node, a path (path) is taken as an edge of the graph, and the source node and a target code (target) node are connected to obtain a corresponding sub-graph. And inputting all obtained subgraphs into a preset first model for training to obtain the first model. The preset first model is, for example, a neural network model. That is, the first model in the embodiment of the present invention is a trained model, and the first model may output the context information feature vector corresponding to the code segment.
Step 202: inputting the code segments to be detected into a first model to obtain context information feature vectors corresponding to the code segments to be detected; the first model is used for extracting characteristics of the triplet context information corresponding to the code segment to be detected; the triplet context information is obtained based on path recognition of an abstract syntax tree constructed from the code fragments to be detected. In the embodiment of the invention, the first model can construct an abstract syntax tree based on the code segments to be detected, and the abstract syntax tree is input into a sub-detection model in the first model to obtain the triple context information comprising an initial leaf node, a target leaf node and a path node connecting the initial leaf node and the target leaf node; a subgraph is constructed based on the triplet context information and a context information feature vector is determined based on the subgraph.
Step 203: inputting the code segments to be detected into a second model to obtain the stain path information feature vectors corresponding to the code segments to be detected; the second model is used for extracting features of a sensitive code path which corresponds to the code segment to be detected and is determined based on the pollution point source.
In the embodiment of the present invention, the second model may be obtained in the following manner. Specifically, after the data set is obtained, an abstract syntax tree, a control flow graph (Control Flow Graph, CFG) and a Call Graph (CG) corresponding to each code segment are sequentially constructed according to the code segments in the data set. One CFG is a graph representing a program execution flow in one method, and nodes in the graph are sentences (or instructions) and information representing a directional order of execution. For example, if the next statement after statement a is statement B, then there is a directed edge from a to B in the CFG. A CG represents a graph of call relationships between functions in the whole program, nodes in the graph are methods, edges represent call relationships, for example, method 1 calls method 2, and there should be a directed edge from method 1 to method 2 in the CG. Note that each node represents a basic block in the above figures (i.e., CFG, AST, and C G). And then, according to AST, CFG and CG, combining a sensitive path analysis algorithm to complete the construction of paths of the code fragments in basic blocks, among basic blocks and among function calls, and obtaining all path information.
In the embodiment of the present invention, the path search within the basic block may be understood as recording the entire path from the entry of the basic block. And traversing sentences in the basic blocks inside the basic blocks, and gradually recording path information according to the execution sequence. This captures the sensitive paths within the basic block.
In the embodiment of the present invention, path searching between basic blocks can be understood as searching all the preamble blocks of the current basic block and recording a new path. When one basic block is finished, the sensitive path analysis algorithm may examine the subsequent blocks of the current basic block, then copy the current path to a new path, and continue searching for the subsequent blocks. This captures the sensitive paths between the basic blocks.
In an embodiment of the present invention, a path search between function calls may be understood as a caller (caller) that will find the current function and record a new path. When encountering a function call statement, the sensitive path analysis algorithm tracks the called function and copies the current path to a new path, and then continues searching for the called function. In this way, sensitive paths between function calls can be captured, so that a path set will be constructed based on the paths in the basic blocks, the paths between basic blocks, and the paths between function calls obtained as described above.
In the embodiment of the invention, after the path set is obtained, a static stain Analysis (point Analysis) method can be used to obtain the path with the stain source. Alternatively, user input, database queries, etc. may be considered a source of contamination. A path exists to identify the dirty source based on variables assigned by the dirty source, functions passed as parameters, return values of functions, document object model (Document Object Model, DOM) operations, and the like. Optionally, for each path with a dirty point source, the code statement corresponding to the path may be segmented (segment) and vectorized by Word2Vec, and after calculating the position feature, the two are added to obtain a feature matrix, which is a representation vector of each path with a dirty point source, as an input of the pre-trained second model. The second model to be trained is, for example, a transducer model. For example, referring to fig. 4, fig. 4 is a training schematic diagram of a second model according to an embodiment of the present invention. In fig. 4, the transducer model mainly comprises two parts of an Encoder (Encoder) and a Decoder (Decoder), and each of the two parts comprises 6 basic parts, wherein each of the basic parts corresponding to the Encoder comprises a multi-head attention mechanism, an addition & normalization, a feedforward layer and an addition & normalization, and each of the basic parts corresponding to the Decoder comprises a mask multi-head attention mechanism, an addition & normalization, a feedforward layer and an addition & normalization.
In the embodiment of the invention, the obtained feature matrix can be input into an encoder, and the encoder mainly comprises a Multi-Head Attention layer (Multi-Head Attention), a residual connection, a normalization layer and a feedforward layer. The multi-headed attention layer is formed by a plurality of self-attention layer combinations. The attention coefficient matrix can be obtained according to the formula through the input characteristic matrix:
wherein Q, K, V are the input feature matrices obtained by linearly transforming the matrices, d k The vector dimension, the final attention coefficient matrix is the output of the single-layer self-attention layer. The attention coefficient matrix represents the correlation between different positions in the input sequence. Then, the attention coefficient matrix is normalized to obtain an attention weight matrix. The attention weight matrix is multiplied by the value matrix to obtain a feature representation weighted by the attention mechanism. To enhance the expressive power of the model, the transducer model introduces a multi-headed self-attention mechanism. In the multi-head self-attention layer, a plurality of groups of different query, key and value matrixes are obtained by carrying out different linear transformations on the query, key and value matrixes. Each set of matrices can be seen as a representation of features learned from different "attention heads". The final feature representation is obtained by computing the attention weight matrix for each set of attention heads and weighting and summing them. Then, through the processing of the feed forward layer, the feature representation is further mapped to a higher dimensional space and non-linearly transformed by an activation function. The feed-forward neural network layer is typically composed of two linear transforms and an activation function, such as a ReLU. Through the stacking of multiple encoder layers, the model is able to progressively extract higher-level feature representations in the input sequence. Each encoder layer can be seen as a single abstraction of the input features, capturing different levels of semantic and contextual information. Further, the encoded information matrix may be input into a decoder group, so that a feature vector that can output learned feature representation, i.e., the stain path information, may be obtained.
Therefore, the second model in the embodiment of the invention is a model trained on the preset transducer model, and the second model can process the input code segment to finally obtain the feature vector of the stain path information.
In the embodiment of the invention, the second model can construct an abstract syntax tree, a control flow graph and a call graph based on the code segments to be detected; carrying out path search on the abstract syntax tree, the control flow graph and the call graph based on a preset path analysis algorithm to obtain all paths corresponding to the code segments to be detected; the path is used for representing the execution sequence of a group of codes; determining whether a pollution point source exists in all paths based on a preset static stain analysis algorithm, and obtaining paths with the pollution point sources; and analyzing and processing the paths with the dirty point sources to obtain the dirty path information feature vectors corresponding to the code segments to be detected.
Step 204: inputting the context information feature vector and the stain path information feature vector into a third model to obtain a detection result of the code segment to be detected; and the third model is used for carrying out XSS attack recognition on the context information feature vector and the taint path information feature vector.
In the embodiment of the invention, a preset model using a multi-layer sensing mechanism (Multilayer Perceptron, MLP) can be trained according to the obtained data set and the feature vectors output by the first model and the second model, so as to obtain a third model. The preset MLP model comprises two fully-connected hidden layers and a drop-out (drop) layer, wherein the feature vectors of the input layers are operated, and the operated results are input into a through-activation function (Softmax) to obtain a result for indicating whether XSS attack exists.
In the embodiment of the invention, the computer equipment can sense the context semantic information and the code grammar information in the context information feature vector and the stain path information feature vector based on a multi-layer sensing mechanism in the third model; and carrying out XSS attack recognition based on the context semantic information and the code grammar information to obtain a detection result of the code segment to be detected.
In the embodiment of the invention, the third model can identify the context semantic information in the code segment to be detected for the context information feature vector, so that whether the identified context semantic information belongs to the context semantic set corresponding to the XSS attack can be judged, and whether potential attack paths and loopholes exist can be further identified; and the third model can identify code grammar information in the code segment to be detected for the stain path information feature vector, so as to judge whether the identified code grammar information belongs to a grammar information set corresponding to XSS attack, and further judge whether the code grammar information in the code segment to be detected has the characteristics of XSS attack. Wherein the context semantic set and the grammar information set correspond to updates based on actual implementation.
Further, the detection result is obtained by combining the recognition result of the context semantic information and the recognition result of the code grammar information with the respective corresponding weights. The detection result comprises a value for indicating the degree of marking that the XSS attack exists, and when the value is larger than a preset threshold value, the existence of the XSS attack of the code segment to be detected is determined.
In the embodiment of the invention, when the detection result of the code segment to be detected is determined to be that XSS attack exists, an alarm event is generated, and the alarm event is sent to an object associated with the code segment to be detected.
In the embodiment of the invention, the AST is subjected to feature coding by means of a graph rolling network (GCN), namely, the first model can understand and learn the structural information of the source code, and higher-order feature expression is obtained. And applying Taint Analysis (Taint Analysis) to track the data flow of the source code while understanding the context information of the source code using a transducer model. Thus, the structure, semantics and data flow information of the embedded source code can be accurately obtained to identify complex data dependencies and flows in the code. Further, AST features extracted through GCN and features extracted through a transducer are fused together to obtain fused features, and the fused features are input into a multi-layer perceptron (MLP) classifier to predict whether source codes have risks of XSS attack.
Based on the same inventive concept, the embodiment of the invention provides an XSS attack detection device, which can realize the functions corresponding to the XSS attack detection method. The XSS attack detection means may be a hardware structure, a software module, or a hardware structure plus a software module. The XSS attack detection device can be realized by a chip system, and the chip system can be formed by a chip or can contain the chip and other discrete devices. Referring to fig. 5, the XSS attack detection apparatus includes: a determining unit 501, a first obtaining unit 502, a second obtaining unit 503, and a processing unit 504, wherein:
A determining unit 501, configured to determine a code segment to be detected;
a first obtaining unit 502, configured to input the code segment to be detected into a first model, and obtain a context information feature vector corresponding to the code segment to be detected; the first model is used for extracting characteristics of the triplet context information corresponding to the code segment to be detected; the triplet context information is obtained based on path recognition of an abstract syntax tree constructed by the code segments to be detected;
a second obtaining unit 503, configured to input the code segment to be detected into a second model, and obtain a stain path information feature vector corresponding to the code segment to be detected; the second model is used for extracting characteristics of a sensitive code path which corresponds to the code segment to be detected and is determined based on a pollution point source;
a processing unit 504, configured to input the context information feature vector and the stain path information feature vector into a third model, and obtain a detection result of the code segment to be detected; and the third model is used for carrying out XSS attack recognition on the context information feature vector and the taint path information feature vector.
In a possible implementation manner, the first obtaining unit 502 is specifically configured to:
constructing an abstract syntax tree based on the code segment to be detected, inputting the abstract syntax tree into a sub-detection model in the first model, and obtaining triple context information comprising an initial leaf node, a target leaf node and a path node connecting the initial leaf node and the target leaf node;
and constructing a subgraph based on the triplet context information, and determining a context information feature vector based on the subgraph.
In a possible implementation manner, the second obtaining unit 503 is specifically configured to:
constructing an abstract syntax tree, a control flow graph and a call graph based on the code segments to be detected;
performing path searching on the abstract syntax tree, the control flow graph and the call graph based on a preset path analysis algorithm to obtain all paths corresponding to the code segments to be detected; the path is used for representing the execution sequence of a group of codes;
determining whether a pollution point source exists in all paths based on a preset static stain analysis algorithm, and obtaining paths with the pollution point sources;
and analyzing and processing the path with the dirty point source to obtain a dirty path information feature vector corresponding to the code segment to be detected.
In a possible implementation manner, the processing unit 504 is specifically configured to:
based on a multi-layer perception mechanism in the third model, perceiving context semantic information and code grammar information in the context information feature vector and the taint path information feature vector;
and carrying out XSS attack recognition based on the context semantic information and the code grammar information to obtain a detection result of the code segment to be detected.
In a possible implementation manner, the device further comprises a prompting unit, configured to:
and when the detection result of the code segment to be detected is that XSS attack exists, generating an alarm event, and sending the alarm event to an object associated with the code segment to be detected.
All relevant contents of each step related to the foregoing embodiment of the XSS attack detection method may be cited to the functional description of the functional module corresponding to the XSS attack detection device in the embodiment of the present invention, which is not described herein.
The division of the modules in the embodiments of the present invention is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present invention may be integrated in one controller, or may exist separately and physically, or two or more modules may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules.
Based on the same inventive concept, an embodiment of the present invention provides a computer device, referring to fig. 6, where the computer device includes at least one processor 601 and a memory 602 connected to the at least one processor, the embodiment of the present invention is not limited to a specific connection medium between the processor 601 and the memory 602, and in fig. 6, the connection between the processor 601 and the memory 602 is taken as an example, and in fig. 6, the bus 600 is shown in a thick line, and the connection manner between other components is merely illustrative and not limited to the example. The bus 600 may be divided into an address bus, a data bus, a control bus, etc., and is represented by only one thick line in fig. 6 for convenience of representation, but does not represent only one bus or one type of bus. Furthermore, the XSS attack detection apparatus further comprises a communication interface 603 for receiving or transmitting data.
In the embodiment of the present invention, the memory 602 stores instructions executable by the at least one processor 601, and the at least one processor 601 may execute the steps included in the XSS attack detection method by executing the instructions stored in the memory 602.
The processor 601 is a control center of the computer device, and may use various interfaces and lines to connect various parts of the entire computer device, and by executing or executing instructions stored in the memory 602 and invoking data stored in the memory 602, various functions of the computer device and processing data, thereby performing overall monitoring of the computer device.
Alternatively, the processor 601 may include one or more processing units, and the processor 601 may integrate an application processor and a modem processor, wherein the application processor primarily processes operating systems, user interfaces, application programs, and the like, and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 601. In some embodiments, processor 601 and memory 602 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.
The processor 601 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, which may implement or perform the methods, steps and logic blocks disclosed in embodiments of the invention. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.
The memory 602 is a non-volatile computer readable storage medium that can be used to store non-volatile software programs, non-volatile computer executable programs, and modules. The Memory 602 may include at least one type of storage medium, which may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory), magnetic Memory, magnetic disk, optical disk, and the like. Memory 602 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 602 in embodiments of the present invention may also be circuitry or any other device capable of performing storage functions for storing program instructions and/or data.
By programming the processor 601, the codes corresponding to the XSS attack detection method described in the foregoing embodiment may be cured into the chip, so that the chip can execute the steps of the foregoing XSS attack detection method during operation, and how to program the processor 601 is a technology known to those skilled in the art is not repeated here.
Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of the XSS attack detection method as described above.
In some possible embodiments, various aspects of the XSS attack detection method provided by the present invention may also be implemented in the form of a program product comprising program code for causing a controlling computer device to carry out the steps of the XSS attack detection method according to the various exemplary embodiments of the present invention as described in the present specification, when said program product is run on the controlling computer device.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. An XSS attack detection method, the method comprising:
determining a code segment to be detected;
inputting the code segment to be detected into a first model to obtain a context information feature vector corresponding to the code segment to be detected; the first model is used for extracting characteristics of the triplet context information corresponding to the code segment to be detected; the triplet context information is obtained based on path recognition of an abstract syntax tree constructed by the code segments to be detected;
Inputting the code segment to be detected into a second model to obtain a stain path information feature vector corresponding to the code segment to be detected; the second model is used for extracting characteristics of a sensitive code path which corresponds to the code segment to be detected and is determined based on a pollution point source;
inputting the context information feature vector and the stain path information feature vector into a third model to obtain a detection result of the code segment to be detected; and the third model is used for carrying out XSS attack recognition on the context information feature vector and the taint path information feature vector.
2. The method of claim 1, wherein inputting the code segment to be detected into a first model, obtaining a context information feature vector corresponding to the code segment to be detected, comprises:
constructing an abstract syntax tree based on the code segment to be detected, inputting the abstract syntax tree into a sub-detection model in the first model, and obtaining triple context information comprising an initial leaf node, a target leaf node and a path node connecting the initial leaf node and the target leaf node;
and constructing a subgraph based on the triplet context information, and determining a context information feature vector based on the subgraph.
3. The method of claim 1, wherein inputting the code segment to be detected into a second model to obtain a spot path information feature vector corresponding to the code segment to be detected, comprises:
constructing an abstract syntax tree, a control flow graph and a call graph based on the code segments to be detected;
performing path searching on the abstract syntax tree, the control flow graph and the call graph based on a preset path analysis algorithm to obtain all paths corresponding to the code segments to be detected; the path is used for representing the execution sequence of a group of codes;
determining whether a pollution point source exists in all paths based on a preset static stain analysis algorithm, and obtaining paths with the pollution point sources;
and analyzing and processing the path with the dirty point source to obtain a dirty path information feature vector corresponding to the code segment to be detected.
4. A method according to any one of claims 1-3, wherein inputting the context information feature vector and the stain path information feature vector into a third model to obtain a detection result of the code segment to be detected comprises:
based on a multi-layer perception mechanism in the third model, perceiving context semantic information and code grammar information in the context information feature vector and the taint path information feature vector;
And carrying out XSS attack recognition based on the context semantic information and the code grammar information to obtain a detection result of the code segment to be detected.
5. The method of claim 4, wherein the method further comprises:
and when the detection result of the code segment to be detected is that XSS attack exists, generating an alarm event, and sending the alarm event to an object associated with the code segment to be detected.
6. An XSS attack detection apparatus, the apparatus comprising:
a determining unit for determining a code segment to be detected;
the first obtaining unit is used for inputting the code segment to be detected into a first model to obtain a context information feature vector corresponding to the code segment to be detected; the first model is used for extracting characteristics of the triplet context information corresponding to the code segment to be detected; the triplet context information is obtained based on path recognition of an abstract syntax tree constructed by the code segments to be detected;
the second obtaining unit is used for inputting the code segment to be detected into a second model to obtain a stain path information feature vector corresponding to the code segment to be detected; the second model is used for extracting characteristics of a sensitive code path which corresponds to the code segment to be detected and is determined based on a pollution point source;
The processing unit is used for inputting the context information feature vector and the stain path information feature vector into a third model to obtain a detection result of the code segment to be detected; and the third model is used for carrying out XSS attack recognition on the context information feature vector and the taint path information feature vector.
7. The apparatus according to claim 6, wherein the first obtaining unit is specifically configured to:
constructing an abstract syntax tree based on the code segment to be detected, inputting the abstract syntax tree into a sub-detection model in the first model, and obtaining triple context information comprising an initial leaf node, a target leaf node and a path node connecting the initial leaf node and the target leaf node;
and constructing a subgraph based on the triplet context information, and determining a context information feature vector based on the subgraph.
8. The apparatus according to claim 6, wherein the second obtaining unit is specifically configured to:
constructing an abstract syntax tree, a control flow graph and a call graph based on the code segments to be detected;
performing path searching on the abstract syntax tree, the control flow graph and the call graph based on a preset path analysis algorithm to obtain all paths corresponding to the code segments to be detected; the path is used for representing the execution sequence of a group of codes;
Determining whether a pollution point source exists in all paths based on a preset static stain analysis algorithm, and obtaining paths with the pollution point sources;
and analyzing and processing the path with the dirty point source to obtain a dirty path information feature vector corresponding to the code segment to be detected.
9. A computer device, the computer device comprising: memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the XSS attack detection method as claimed in any of claims 1 to 5.
10. A computer readable storage medium, having stored thereon a computer program which when executed by a processor implements the steps of the XSS attack detection method as claimed in any of claims 1 to 5.
CN202311660661.2A 2023-12-05 2023-12-05 XSS attack detection method and device, computer equipment and storage medium Pending CN117728995A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311660661.2A CN117728995A (en) 2023-12-05 2023-12-05 XSS attack detection method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311660661.2A CN117728995A (en) 2023-12-05 2023-12-05 XSS attack detection method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117728995A true CN117728995A (en) 2024-03-19

Family

ID=90202681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311660661.2A Pending CN117728995A (en) 2023-12-05 2023-12-05 XSS attack detection method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117728995A (en)

Similar Documents

Publication Publication Date Title
CN111639344B (en) Vulnerability detection method and device based on neural network
CN111600919B (en) Method and device for constructing intelligent network application protection system model
CN109905385B (en) Webshell detection method, device and system
CN112800427B (en) Webshell detection method and device, electronic equipment and storage medium
Zhu et al. Android malware detection based on multi-head squeeze-and-excitation residual network
CN108520180A (en) A kind of firmware Web leak detection methods and system based on various dimensions
CN108229170B (en) Software analysis method and apparatus using big data and neural network
CN112668013B (en) Java source code-oriented vulnerability detection method for statement-level mode exploration
CN107341399A (en) Assess the method and device of code file security
CN113010209A (en) Binary code similarity comparison technology for resisting compiling difference
Zeng et al. EtherGIS: a vulnerability detection framework for ethereum smart contracts based on graph learning features
Zhang et al. A php and jsp web shell detection system with text processing based on machine learning
Assefa et al. Intelligent phishing website detection using deep learning
CN116405326A (en) Information security management method and system based on block chain
CN115033895A (en) Binary program supply chain safety detection method and device
CN117235745B (en) Deep learning-based industrial control vulnerability mining method, system, equipment and storage medium
CN112817877B (en) Abnormal script detection method and device, computer equipment and storage medium
Tian et al. Plagiarism detection of multi-threaded programs via siamese neural networks
CN113918936A (en) SQL injection attack detection method and device
CN116074092B (en) Attack scene reconstruction system based on heterogram attention network
Kuang et al. Automated data-processing function identification using deep neural network
CN117728995A (en) XSS attack detection method and device, computer equipment and storage medium
CN111475812B (en) Webpage backdoor detection method and system based on data executable characteristics
Tian et al. Fine-grained obfuscation scheme recognition on binary code
Xiao et al. Detecting anomalies in cluster system using hybrid deep learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination