CN116910749A

CN116910749A - Construction method, device and application of XSS attack detection model

Info

Publication number: CN116910749A
Application number: CN202310905601.6A
Authority: CN
Inventors: 毛云青; 李斌; 曹鹏寅; 梁艺蕾
Original assignee: CCI China Co Ltd
Current assignee: CCI China Co Ltd
Priority date: 2023-07-21
Filing date: 2023-07-21
Publication date: 2023-10-20

Abstract

The application provides a construction method, a device and an application of an XSS attack detection model, which comprise the following steps: constructing an XSS attack detection model, wherein the XSS attack detection model comprises word frequency statistics and sequence embedding of special characters; the model consists of a pre-trained neural network, a pre-trained gradient lifting tree and a cross attention network; extracting sequence features by the neural network, and performing feature splicing with word frequency statistics of special characters; making a decision by the gradient lifting tree, and splicing leaf nodes with the features to obtain crossed features; the cross attention network calculates cross attention weight through the dot multiplication result and the cross feature to obtain a cross weight vector. And inputting the cross weight vector and the dot multiplication result into a classifier to obtain a detection result. According to the scheme, the characteristics are extracted by using the pre-trained neural network, so that the reusability of the model is high, the generalization of the model is high by constructing the cross attention network for detection, and the prediction accuracy is high.

Description

Construction method, device and application of XSS attack detection model

Technical Field

The application relates to the field of network security, in particular to a method, a device and an application for constructing an XSS attack detection model.

Background

XSS (Cross-Site script attack) is a common network security vulnerability, and an attacker uses this vulnerability to inject malicious script codes into a web page, and when a user accesses the web page containing the malicious script, the script is executed in the user's browser, thereby causing the attacker to execute malicious operations on the user's browser. XSS attacks typically occur in web applications, particularly where users are allowed to submit or display content, such as forums, blogs comments, user input boxes, and the like. An attacker may inject content containing malicious code into the data submitted by the user, and then their browser will execute the malicious code when other users access the page containing the malicious code.

In the prior art, a single deep learning framework is generally adopted to conduct classified prediction on an XSS message, for example, a single LSTM and a simple RNN are adopted as feature extraction, a deep neural network is adopted to conduct classified prediction in cooperation with a softmax or a sigmoid function in the aspect of prediction, firstly, compared with traditional text data, the content of a corpus of the XSS message is quite limited, so that the feature of the RNN extraction is not comprehensive, secondly, the traditional RNN considers sequence information in a single direction and does not consider the relation between opposite sequences, therefore, the single deep learning is adopted to conduct insufficiently comprehensive utilization on the XSS message information, the model accuracy is not high, and the other method is to conduct word frequency statistics on special characters in the XSS message, take the counted character word frequency as the feature and conduct predictive classification through machine learning.

In view of the foregoing, there is a need for a method for accurately identifying XSS messages by constructing a model, which is easy to train and has a strong generalization ability.

Disclosure of Invention

The embodiment of the application provides a construction method, a device and an application of an XSS attack detection model, which are characterized in that the sequence characteristics are extracted through a pre-trained double-layer BiLSTM network, so that the model has high reusability and is easy to train, when the double-layer BiLSTM network is pre-trained, MASK MASKs are adopted for pre-training, so that the model is easier to migrate in a downstream task, and a cross weight vector is obtained through a gradient lifting tree and a cross attention network, so that the model has strong generalization and high prediction accuracy.

In a first aspect, an embodiment of the present application provides a method for constructing an XSS attack detection model, where the method includes:

acquiring at least one XSS message data as a training sample, and performing word frequency statistics on special characters in each training sample to generate table data, wherein the feature quantity in the table data is the same as the types of the special characters;

after word segmentation is carried out on each training sample, a sample sequence of each training sample is obtained through an embedding layer;

constructing an XSS attack detection model, wherein the XSS attack detection model consists of a pre-trained neural network, a pre-trained gradient lifting tree and a cross attention network, extracting features of each sample sequence by using the pre-trained neural network to obtain sequence features, and performing feature stitching on each sequence feature and the table data to obtain stitching features;

inputting the splicing features into a pre-trained gradient lifting tree for decision making, and carrying out feature splicing on all leaf nodes of the gradient lifting tree to obtain cross features, wherein the cross features comprise a plurality of cross sub-features, and the number of the cross sub-features is the same as that of the leaf nodes of the gradient lifting tree;

performing point multiplication on the table data and the sequence features in the cross attention network to obtain a point multiplication result, splicing the point multiplication result and the cross features, sending the spliced point multiplication result and the cross features into an MLP network to obtain cross attention weights of each cross sub-feature, obtaining cross weight vectors of each cross feature according to the cross attention weights of each cross sub-feature, and inputting the cross weight vectors and the point multiplication result into a classifier to obtain a detection result;

and constructing a loss function to measure the error between the detection result and the real result, and completing the construction of the XSS attack detection model when the error meets the set condition to obtain a trained XSS attack detection model.

In a second aspect, an embodiment of the present application provides an XSS attack detection method, where the method includes:

obtaining data to be detected, extracting form information of the data to be detected, inputting the data to be detected into an XSS attack detection model trained in the embodiment to obtain a detection result, converting the detection result into a json file, and sending the json file to a user, wherein the json file contains ID information and keywords of the data to be detected, if the keywords are displayed as true, the data to be detected are XSS attacks, and if the keywords are displayed as false, the data to be detected are not XSS attacks.

In a third aspect, an embodiment of the present application provides a device for constructing an XSS attack detection model, including:

the acquisition module is used for: acquiring at least one XSS message data as a training sample, and performing word frequency statistics on special characters in each training sample to generate table data, wherein the feature quantity in the table data is the same as the types of the special characters;

the word segmentation module: after word segmentation is carried out on each training sample, a sample sequence of each training sample is obtained through an embedding layer;

the construction module comprises: constructing an XSS attack detection model, wherein the XSS attack detection model consists of a pre-trained neural network, a pre-trained gradient lifting tree and a cross attention network, extracting features of each sample sequence by using the pre-trained neural network to obtain sequence features, and performing feature stitching on each sequence feature and the table data to obtain stitching features;

decision module: inputting the splicing features into a pre-trained gradient lifting tree for decision making, and carrying out feature splicing on all leaf nodes of the gradient lifting tree to obtain cross features, wherein the cross features comprise a plurality of cross sub-features, and the number of the cross sub-features is the same as that of the leaf nodes of the gradient lifting tree;

and a detection module: performing point multiplication on the table data and the sequence features in the cross attention network to obtain a point multiplication result, splicing the point multiplication result and the cross features, sending the spliced point multiplication result and the cross features into an MLP network to obtain cross attention weights of each cross sub-feature, obtaining cross weight vectors of each cross feature according to the cross attention weights of each cross sub-feature, and inputting the cross weight vectors and the point multiplication result into a classifier to obtain a detection result;

loss construction module: and constructing a loss function to measure the error between the detection result and the real result, and completing the construction of the XSS attack detection model when the error meets the set condition to obtain a trained XSS attack detection model.

In a fourth aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to run the computer program to execute a method for constructing an XSS attack detection model or a method for detecting an XSS attack.

In a fifth aspect, an embodiment of the present application provides a readable storage medium having stored therein a computer program including program code for controlling a process to execute a process including a construction method of an XSS attack detection model or an XSS attack detection method.

The main contributions and innovation points of the application are as follows:

according to the embodiment of the application, the feature extraction is carried out by adopting the pre-trained double-layer BiLSTM network, and the pre-training is carried out by adopting the training mode of the MASK MASK when the double-layer BiLSTM network is pre-trained, so that the model is easier to migrate in a downstream task; the pretrained double-layer BiLSTM network is adopted, so that the reusability of the whole XSS attack detection model is high; according to the scheme, xgBoots are adopted as gradient lifting trees, so that the calculation cost and the training cost of an XSS attack detection model can be reduced; according to the scheme, the accuracy of XSS attack model prediction is improved by constructing the cross attention network to distribute proper weights for each cross sub-feature so as to obtain the cross weight vector of the cross feature, and the generalization capability of the model is stronger due to the addition of the cross attention network; according to the scheme, the user sends the XSS message data to the XSS attack model in a post request mode, so that the model is convenient and quick to use.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of a method for constructing an XSS attack detection model according to an embodiment of the present application;

FIG. 2 is a block diagram of an LSTM cell unit in accordance with an embodiment of the application;

FIG. 3 is a block diagram of a two-layer BiLSTM network in accordance with an embodiment of the present application;

FIG. 4 is a schematic diagram of a decision process of a gradient-lifting tree according to an embodiment of the application;

FIG. 5 is a schematic diagram of a point multiplication result spliced with cross features and sent to an MLP network according to an embodiment of the application;

FIG. 6 is a schematic diagram of inputting cross weight vectors and dot products into a classifier to obtain detection results according to an embodiment of the present application;

FIG. 7 is a block diagram of an apparatus for constructing an XSS attack detection model according to an embodiment of the present application;

fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.

It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.

Example 1

The embodiment of the application provides a method for constructing an XSS attack detection model, and specifically refers to FIG. 1, wherein the method comprises the following steps:

In some embodiments, the training samples in the scheme can obtain the XSS attack message data through the company internal website or in the gilthub open source project, and in the scheme, a total of 20 ten thousand positive samples and 4 ten thousand negative samples are obtained.

In some embodiments, in the step of generating table data by word frequency statistics for special characters in each training sample, the special characters include script, java, iframe, and the like "<”、“>"," \ "", "\'" ","% "(", ") and the special characters described above are used as features in table data, which is denoted as V (V) in this scheme ₁ ，v ₂ ，…，v ₁₀ ) Wherein v is ₁ -v ₁₀ The feature type in the table data is represented and corresponds to a special character.

In some embodiments, in the step of "obtaining a sample sequence of each training sample through an embedding layer after word segmentation of each training sample", each word segmentation result of the training sample is converted into a corresponding digital identifier, and then feature mapping is performed through the embedding layer to obtain the sample sequence of each training sample.

Specifically, based on the specificity of the XSS message, word segmentation is performed according to the following word segmentation rules: 1. replacing all Arabic numerals with 0;2. replacing all 'http' and 'https' links with http:// u and performing word segmentation; 3. word segmentation is carried out on the content added with the quotation marks; 4. the content in the tag "< >" and "< script >" are segmented; 5. word segmentation is carried out on the parameter name "topic="; 6. and (3) performing word segmentation on the word formed by the characters.

In some embodiments, each word segmentation result of the training sample is converted into a corresponding digital identifier, and then feature mapping is performed through an embedding layer to obtain a sample sequence D (D ₁ ，d ₂ ，…d _n ) Wherein d is ₁ -d _n And a digital identifier corresponding to each word segmentation result.

In some embodiments, the neural network is a double-layer BiLSTM network, the double-layer BiLSTM network is composed of a forward direction LSTM network and a reverse direction LSTM network which are connected in parallel, and the sequence features are obtained by feature stitching the features extracted by the forward direction LSTM and the features extracted by the direction LSTM.

In some embodiments, the structure of the LSTM cell unit is shown in FIG. 2, and the interior of the LSTM has a sufficiently complex gating structure, which enables the LSTM to carry effective information for a long enough time without being forgotten, while solving the problems of gradient extinction and gradient explosion.

In the scheme, because the generation of t+1 phase characteristics does not need to be considered in the traditional cyclic neural network, the scheme adopts a double-layer BiLSTM network in order to acquire more information.

Specifically, the structure of the double-layer BiLSTM network is shown in fig. 3, and the double-layer BiLSTM is formed by adding a reverse LSTM on the basis of a forward LSTM, and feature stitching the features extracted in the forward and reverse directions to obtain sequence features.

Specifically, the double-layer BiLSTM can capture context information, not only considers the input at the current moment, but also considers the previous input and the subsequent input, and effectively learns the past and future dependency relationships by running two independent LSTM layers in two directions of a time sequence, so that the sequence data can be better understood and predicted, and the capability of extracting and understanding the sequence specific layers is improved.

In some embodiments, the neural network is pre-trained by randomly selecting a first number of characters in the sample sequence as MASK MASKs when the neural network is pre-trained.

For example, in this scheme, when training the neural network, the parameter max_len is set to 20, the number of characters selected by the mask is 20, that is, the number of characters selected in this scheme is 4, the hidden layer of the bilstm is set to 1, and 50 epochs and 10000 batch_size are selected as training super-parameters.

Specifically, the training mode of MASK is adopted to pretrain the neural network, so that the model can be more easily migrated in the downstream task.

In some embodiments, the sequence data is row-wise stitched with the tabular data.

Illustratively, the sequence features O (O ₁ ，o ₂ ，…，o _n ) And the table data V (V ₁ ，v ₂ ，…，v ₁₀ ) The concat stitching in the row direction is performed to form a total of n features, n being the set longest number of characters.

In some embodiments, in the step of inputting the splicing feature into a pre-trained gradient lifting tree to make a decision and performing feature splicing on all leaf nodes of the gradient lifting tree to obtain a cross feature, the splicing feature obtains two subtrees through splitting of the gradient lifting tree, and performs feature splicing on all leaf nodes with weights in the two subtrees to obtain the cross feature.

Specifically, the decision process of the gradient-lifted tree is shown in fig. 4, and the sequence feature O (O ₁ ，o ₂ ，…，o _n ) And table data V (V ₁ ，v ₂ ，…，v ₁₀ ) The two sub-trees are obtained by splitting the composed splicing characteristics through the gradient lifting tree and are respectively Q ₁ Q and ₂ the number of leaf nodes with weights corresponding to the two subtrees is 3 and 5 respectively, so that the length of the obtained intersecting vector is 9, and the leaf nodes at the second, fourth and seventh positions are activated leaf nodes, so that vectors with the positions of 1 and the other positions of 0 are obtained.

Specifically, each leaf node contains a weight, and when a set of features is input to the gradient-lifted tree, the gradient-lifted tree determines which leaf node the feature falls into, and the fallen-in leaf node can be considered to be an activated leaf node, and the leaf node which does not fall into is not activated.

In some embodiments, the gradient lifting tree used in the scheme is xgboost, the conventional gradient lifting tree is obtained by adding a plurality of decision tree models, and each subtree is a fit of the residual of the fitting result of the previous plurality of trees, as shown in the following formula:

wherein F is a set of decision trees in a gradient lifting tree, F is a decision subtree, the feature x is input into the gradient lifting tree, and a predicted value is obtained through reasoning of the gradient lifting treeWhile xgboost improves on the traditional gradient-lifted tree, first of all with respect to a regularized objective function:

wherein l is a minutely convex loss function for measuringPredictive valueAnd true value y _i Omega is a regularization term of the model to limit the complexity of the model, which includes the number of leaves and the magnitude of the leaf weights, T represents the number of leaves, w is the leaf weight, and gamma and lambda are hyper-parameters that adjust the complexity of the model.

The gradient lifting tree in the scheme adopts a cross entropy loss function as a classification objective function of xgboost during pre-training, the maximum depth of the tree is 6, the number of leaves is 59 at most, the learning rate is 0.01, and a single-side gradient sampling (One-sided Gradient Estimation) method is adopted on sample selection, so that the gradient lifting tree is mainly used for reducing the calculated amount in the gradient calculation process. Conventional gradient estimation methods typically use all samples to calculate the gradient, but in the case of large-scale data sets or complex models, such calculation costs can be high. Single-sided gradient sampling reduces computational complexity by estimating the gradient using only a portion of the samples. It may select samples for sampling based on the particular application scenario and sampling strategy. During training, single-sided gradient sampling typically incorporates either a random gradient descent (Stochastic Gradient Descent, SGD) or a small batch gradient descent (Mini-batch Gradient Descent) approach.

In some embodiments, the tabular data, the sequence features and the cross features are projected into the same dimension before being input into the cross-attention network.

In particular, projecting the tabular data, the sequence features and the cross features to the same dimension may facilitate computation in a cross-attention network.

The cross feature Q (Q ₁ ，q ₂ ，…q _k ) The sequence features O (O ₁ ，o ₂ ，…o ₂₀ ) And the table data V (V ₁ ，v ₂ ，…v ₁₀ ) Projection look-ahead projection into the L dimension can be expressed by the following formula:

Q′(q ₁ ，q ₂ ，…q _L )＝Q(q ₁ ，q ₂ ，…q _k )W _Q

O′(o ₁ ,o ₂ ,…o ₂ )＝O(o ₁ ,o ₂ ,…o ₂₀ )W _O

V(v ₁ ，v ₂ ，…v _L )＝V(v ₁ ，v ₂ ，…v ₁₀ )W _V

wherein, Q ', O ', V ' respectively dimension the cross feature, sequence feature, result after the table data projection.

In some embodiments, in the step of "splicing the point multiplication result with the cross feature and sending the spliced result to the MLP network to obtain the cross attention weight of each cross sub-feature", the combination weight of the point multiplication result and the cross feature is obtained in the cross attention network, the combination weight is added with the bias parameter in the cross attention network to obtain the bias combination weight, the weight matrix is obtained, the bias combination weight activated by the activation function is multiplied with the transposed matrix of the weight matrix to obtain the initial weight, and the initial weight is normalized to obtain the cross attention weight.

Specifically, a schematic diagram of the point multiplication result and the cross feature after being spliced is shown in fig. 5, in this scheme, the weight matrix is learned by a training algorithm, the combination weight is a parameter in the cross attention network, and the combination weight is continuously adjusted to make the prediction result of the cross attention network more accurate, and the specific formula is as follows:

wherein h is the weight matrix, reLU is the activation function adopted by the scheme, W is the combination weight, b is the bias parameter,representing the normalization process, the present scheme uses softmax for normalization.

Specifically, the model is trained by constructing the cross attention network, so that the generalization capability of the model obtained by training is stronger and the accuracy is higher.

In some embodiments, in the step of "deriving a cross weight vector for each cross feature from the cross attention weights for each cross sub-feature", the cross attention weights for each cross sub-feature are processed using an averaging pooling layer to derive an average cross weight vector that is the cross weight vector for the corresponding cross feature, or the cross attention weights for each cross sub-feature are processed using a maximizing pooling layer to derive a maximum cross weight vector that is the cross weight vector for the corresponding cross feature.

Illustratively, the formula for processing the cross-attention weights for each cross-sub-feature using the averaging pooling layer is as follows:

wherein e _avg (Q) is an average cross weight vector, wuil is the cross attention weight of each cross sub-feature, Q _l Is a cross sub-feature.

Illustratively, the formula for processing the cross-attention weights for each cross-sub-feature using the max-pooling layer is as follows:

wherein e _max (9) Is the maximum cross weight vector, w _uil Cross attention weight, q, for each cross sub-feature _l Is a cross sub-feature.

In some embodiments, the cross weight vector and the dot product result are input into a classifier to obtain a detection result as shown in fig. 6The detection result is obtained.

Example two

An XSS attack detection method, comprising:

Specifically, the user can send the data to be detected to the XSS attack detection model in the web segment in a post request mode, and in this way, the model can be called more conveniently and rapidly.

Example III

Based on the same conception, referring to fig. 7, the application further provides a device for constructing an XSS attack detection model, which comprises:

Example IV

This embodiment also provides an electronic device, referring to fig. 8, comprising a memory 404 and a processor 402, the memory 404 having stored therein a computer program, the processor 402 being arranged to run the computer program to perform the steps of any of the method embodiments described above.

In particular, the processor 402 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present application.

The memory 404 may include, among other things, mass storage 404 for data or instructions. By way of example, and not limitation, memory 404 may comprise a Hard Disk Drive (HDD), floppy disk drive, solid State Drive (SSD), flash memory, optical disk, magneto-optical disk, tape, or Universal Serial Bus (USB) drive, or a combination of two or more of these. Memory 404 may include removable or non-removable (or fixed) media, where appropriate. Memory 404 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 404 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 404 includes Read-only memory (ROM) and Random Access Memory (RAM). Where appropriate, the ROM may be a mask-programmed ROM, a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), an electrically rewritable ROM (EAROM) or FLASH memory (FLASH) or a combination of two or more of these. The RAM may be Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM) where appropriate, and the DRAM may be fast page mode dynamic random access memory 404 (FPMDRAM), extended Data Output Dynamic Random Access Memory (EDODRAM), synchronous Dynamic Random Access Memory (SDRAM), or the like.

Memory 404 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions for execution by processor 402.

Processor 402 implements the method of constructing an XSS attack detection model according to any of the above embodiments by reading and executing computer program instructions stored in memory 404.

Optionally, the electronic apparatus may further include a transmission device 406 and an input/output device 408, where the transmission device 406 is connected to the processor 402 and the input/output device 408 is connected to the processor 402.

The transmission device 406 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wired or wireless network provided by a communication provider of the electronic device. In one example, the transmission device includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through the base station to communicate with the internet. In one example, the transmission device 406 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.

The input-output device 408 is used to input or output information. In this embodiment, the input information may be training samples, sample sequences, etc., and the output information may be detection results of XSS attack detection, etc.

Alternatively, in the present embodiment, the above-mentioned processor 402 may be configured to execute the following steps by a computer program:

s101, acquiring at least one XSS message data as a training sample, and performing word frequency statistics on special characters in each training sample to generate table data, wherein the number of features in the table data is the same as the types of the special characters;

s102, after word segmentation is carried out on each training sample, a sample sequence of each training sample is obtained through an embedding layer;

s103, constructing an XSS attack detection model, wherein the XSS attack detection model consists of a pre-trained neural network, a pre-trained gradient lifting tree and a cross attention network, each sample sequence is subjected to feature extraction by using the pre-trained neural network to obtain sequence features, and each sequence feature and the table data are subjected to feature stitching to obtain stitching features;

s104, inputting the splicing features into a pre-trained gradient lifting tree for decision making, and carrying out feature splicing on all leaf nodes of the gradient lifting tree to obtain cross features, wherein the cross features comprise a plurality of cross sub-features, and the number of the cross sub-features is the same as that of the leaf nodes of the gradient lifting tree;

s105, performing point multiplication on the table data and the sequence features in the cross attention network to obtain a point multiplication result, splicing the point multiplication result and the cross features, sending the spliced point multiplication result and the cross features into an MLP network to obtain cross attention weight of each cross sub-feature, obtaining a cross weight vector of each cross feature according to the cross attention weight of each cross sub-feature, and inputting the cross weight vector and the point multiplication result into a classifier to obtain a detection result;

s106, constructing a loss function to measure the error between the detection result and the real result, and completing the construction of the XSS attack detection model when the error meets the set condition to obtain a trained XSS attack detection model.

It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and this embodiment is not repeated herein.

In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the application may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the application is not limited thereto. While various aspects of the application may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

Embodiments of the application may be implemented by computer software executable by a data processor of a mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets, and/or macros can be stored in any apparatus-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may include one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. In this regard, it should also be noted that any block of the logic flow as in fig. 8 may represent a program step, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on a physical medium such as a memory chip or memory block implemented within a processor, a magnetic medium such as a hard disk or floppy disk, and an optical medium such as, for example, a DVD and its data variants, a CD, etc. The physical medium is a non-transitory medium.

It should be understood by those skilled in the art that the technical features of the above embodiments may be combined in any manner, and for brevity, all of the possible combinations of the technical features of the above embodiments are not described, however, they should be considered as being within the scope of the description provided herein, as long as there is no contradiction between the combinations of the technical features.

The foregoing examples illustrate only a few embodiments of the application, which are described in greater detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. The method for constructing the XSS attack detection model is characterized by comprising the following steps of:

2. The method of claim 1, wherein the neural network is pre-trained by randomly selecting a first number of characters in the sample sequence as MASK MASKs when the neural network is pre-trained.

3. The method for constructing an XSS attack detection model according to claim 1, wherein in the step of inputting the splicing feature into the gradient lifting tree to make a decision and performing feature splicing on all leaf nodes of the gradient lifting tree to obtain a cross feature, the splicing feature obtains two sub-trees through splitting of the gradient lifting tree, and performs feature splicing on all leaf nodes with weights in the two sub-trees to obtain the cross feature.

4. The method of claim 1, wherein the table data, the sequence features and the cross features are projected to a same dimension and then input to the cross-attention network.

5. The method for constructing an XSS attack detection model according to claim 1, wherein in the step of "splicing the point multiplication result with the cross feature and sending the spliced result to an MLP network to obtain a cross attention weight of each cross sub-feature", a combination weight of the point multiplication result and the cross feature is obtained in the cross attention network, a bias combination weight is obtained by adding the combination weight and a bias parameter in the cross attention network, a weight matrix is obtained, an initial weight is obtained by multiplying the bias combination weight activated by an activation function by a transposed matrix of the weight matrix, and the cross attention weight is obtained by normalizing the initial weight.

6. The method according to claim 1, wherein in the step of obtaining a cross weight vector of each cross sub-feature according to the cross attention weight of each cross sub-feature, the cross attention weight of each cross sub-feature is processed by using an averaging pooling layer to obtain an average cross weight vector, and the average cross weight vector is the cross weight vector of the corresponding cross feature, or the cross attention weight of each cross sub-feature is processed by using a maximizing pooling layer to obtain a maximum cross weight vector, and the maximum cross weight vector is the cross weight vector of the corresponding cross feature.

7. An XSS attack detection method, comprising the steps of:

8. An XSS attack detection model construction device is characterized by comprising the following steps:

9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform a method of constructing an XSS attack detection model as claimed in any of claims 1-6 or a method of XSS attack detection according to claim 7.

10. A readable storage medium, wherein a computer program is stored in the readable storage medium, the computer program comprising program code for controlling a process to execute the process, the process comprising a method of constructing an XSS attack detection model according to any of claims 1-6 or an XSS attack detection method according to claim 7.