CN115242539B

CN115242539B - Network attack detection method and device for power grid information system based on feature fusion

Info

Publication number: CN115242539B
Application number: CN202210905203.XA
Authority: CN
Inventors: 曾纪钧; 梁哲恒; 沈桂泉; 龙震岳; 张金波; 张小陆; 崔磊; 沈伍强
Original assignee: Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2023-06-06
Anticipated expiration: 2042-07-29
Also published as: CN115242539A

Abstract

The invention discloses a network attack detection method and device of a power grid information system based on feature fusion, wherein the method comprises the following steps: carrying out data preprocessing on sample URL data; based on the preprocessed URL data, extracting text features including vocabulary features and statistical features, constructing text feature vectors, and learning potential interaction relations between the text feature vectors by using FFM; based on the preprocessed URL data, performing token extraction to acquire tokens from the URLs, learning vector representations of the URLs through word2vec, and learning distance dependency relations among the URL token vectors by using a time convolution network; and cooperatively training the FFM and the time convolution network by using a self-scheduling learning strategy, identifying URL data to be detected by using a trained model, and completing the detection of the malicious URL based on the identification result of feature fusion. The invention provides an effective means for detecting malicious URL with short life cycle and dynamic change along with confusion strategy in the power grid.

Description

Network attack detection method and device for power grid information system based on feature fusion

Technical Field

The invention belongs to the field of electric power information safety, and particularly relates to a method and a device for detecting network attack of a power grid information system based on feature fusion.

Background

Along with the continuous promotion of electric power data construction, the introduction of information technology brings convenience to an electric power system and simultaneously brings a great number of problems, and one of the remarkable defects is network security. The power network information security objective is information that precautions must be taken to protect the confidentiality, integrity and availability of the power grid. Confidentiality refers to the ability of only authorized personnel to access power information system information. If the network attacker gets this piece of information at will, it can misuse it to make irreparable damage. Integrity is the maintenance and assurance of the true integrity of the power system data to prevent unauthorized alteration and corruption of the data. Availability is the protection of the information system from faults, information must be provided to authorized parties in the grid in time when needed without affecting security. Typical network attacks in smart grid applications are directed primarily to one or more of confidentiality, integrity, and availability information. Therefore, the vulnerability and the network security threat in the power grid are accurately identified, the information of confidentiality, integrity and availability of the power grid is effectively formulated and protected by the strategy, and the method has great significance for guaranteeing the stable operation of the power grid system.

The vast network scale of the power information network results in an increased number of network attacks and vulnerabilities. Despite the wide variety of network crimes currently encountered, the uniform resource locator (Uniform Resource Locator, URL) has served the role of a "gateway" linking the infected grid users with malicious agents. This portion of the malicious URLs masquerade as normal websites and accounts, and prior to clicking on them, it is difficult for potential victims to determine whether a URL is malicious or benign. Thus, detecting malicious URLs in advance is an indispensable task to protect vulnerable users from network attacks.

In order to detect malicious websites, a website blacklist is generally adopted by a mainstream browser to prevent users from accessing the malicious websites. Blacklist based solutions require maintaining a large blacklist by looking up to determine if a URL is benign. The blacklist-based solution is easy to implement, but when the explosive growth of user generated contents is faced, the maintenance of the latest blacklist becomes very difficult, and meanwhile, great effort is required to construct a large-scale blacklist, a great deal of manpower is input, and the time consumption is great. In order to automatically detect malicious URLs, many studies construct a knowledge base through feature engineering and detect malicious URLs using classical machine learning algorithms, including support vector machines, decision trees, random forests, na iotave bayes, and the like. However, machine learning based solutions rely heavily on feature engineering. Because attackers also continuously adjust their policies over time, disabling some learning information, models trained based on short lifecycle training datasets do not work well for emerging malicious URLs. Recently, deep neural networks (DNN, deep Neural Network), including convolutional neural networks (CNN, convolutional Neural Network) and recurrent neural networks (RNN, recurrent Neural Network), have become increasingly popular and have achieved the most advanced performance among a multitude of classification tasks. However, existing DNN-based URL classification task solutions typically treat URLs as text data, learn the depth representation of the URL at the token level, and directly apply the deep learning model, regardless of the unique pattern of the URL. The method has the advantages that the life cycle is short, unique malicious URLs with dynamic changes along with different confusion strategies bring great influence to the security defense detection of the power network information, and how to break through the existing solution to effectively detect the attack characteristics of the malicious URLs of the power grid is a problem to be solved.

Disclosure of Invention

The invention aims to: the invention aims to provide a network attack detection method and device for a power grid information system based on feature fusion, which realize the effective detection of malicious URLs with characteristics of short life cycle, dynamic change along with different confusion strategies and the like in a power system.

The technical scheme is as follows: a network attack detection method of a power grid information system based on feature fusion comprises the following steps:

carrying out data preprocessing on sample URL data, including removing repeated samples, trimming data, and formatting, wherein the data is trimmed to remove symbols and characters of specified conditions, the formatting divides the data into two columns, the trimmed URL is placed in a first column, and a label of the URL is placed in a second column, and the label marks whether the URL is malicious or not;

based on the preprocessed URL data, extracting text features including vocabulary features and statistical features, constructing text feature vectors, and learning potential interaction relations between the text feature vectors by using a bilinear factorizer;

based on the preprocessed URL data, performing token extraction to acquire tokens from the URLs, learning vector representations of the URLs through word2vec, and learning distance dependency relationships between URLtokens vectors by using a time convolution network, wherein the distance dependency relationships are called structural features;

The self-scheduling learning strategy is used for cooperatively training a bilinear factor decomposition machine and a time convolution network, after the whole model is trained, the trained model is used for identifying URL data to be detected, malicious URL detection is completed based on the identification result of feature fusion, the self-scheduling learning strategy is used for reducing entropy values by gradually adding learning data, potential weight parameters are trained, and whether a sample is selected is indicated by introducing weight variables into a loss function.

Further, the data pruning includes:

for data pruning of extracting text features, firstly selecting characters as the smallest data processing unit for a URL data set, then carrying out character frequency statistics, deleting special characters with frequency lower than a specified number, and carrying out standardization operation on the URL length, wherein the standardization operation comprises the steps of comparing the URL length with a specified length threshold, cutting off parts longer than the specified threshold, and filling short parts with zero;

for data pruning to extract structural features, for URL datasets, the consecutive strings after the last? The following sequential string.

Further, performing token extraction to obtain a token from the URL includes:

The per-position divides the URL into four blocks: protocols, fields, paths and files, the first/previous block being the protocol part; the second/previous string is defined as a field part; the last/following character string is considered as a file portion; the remaining strings are considered path parts, with the tokens being located on different chucks with different types of brackets using an alignment policy, where each token of the protocol part is placed in an brackets { }, each tag in the domain part is placed in brackets (), the tag in the path part is placed in brackets < >, and the tag in the file part is placed in brackets [ ].

Further, the learning the potential interaction relationship between text feature vectors using the bilinear factorizer includes:

wherein omega ₀ Is the model bias; omega _i E R is the characteristic variable x _i Weight modeling of (2);

characterization variable x _i And x _j Paired interactions between, k represents the hidden vector length; n represents the characteristic number of the sample; v _i,f Represents x _i Auxiliary vector of>

Representing vector x _i In the corresponding domain f _j Auxiliary vector of>

Is the vector x _j In the corresponding domain f _i Is used for the auxiliary vector of (a).

Further, the learning the distance dependency relationship between URL tokens using the time convolution network includes:

The time convolution network input layer takes token vectorized data as the input of a model, the time convolution network is formed by stacking a plurality of residual modules, each residual module is provided with one input called X, two outputs are high-dimensional tensors, and one residual module is responsible for extracting the time sequence characteristics of a corresponding sequenceAnd features H representing the extraction of the module _T A residual R representing the output of the module _T Each residual block consists of 4 one-dimensional convolutional layers Conv0, conv1, conv2, conv 3: the first convolution layer Conv performs preliminary processing on the input, and the output is C ₀ The method comprises the steps of carrying out a first treatment on the surface of the The input of the second convolution layer Conv1 is C ₀ After the output goes through DropOut, the output is selectively activated by using a Sigmoid function, which is called C ₁ The input of the third convolution layer is C ₀ The output is selected to be activated by a Tanh function after being subjected to DropOut, which is called C ₂ ，C ₁ And C ₂ The input of TCN is considered, and the expansion convolution parameter d is input conv3 after element-by-element multiplication, and the output is H _T ，H _T Adding with the module input X to obtain another output R _T 。

Further, the self-scheduled learning strategy includes:

given data set d= { (x) ₁ ,y ₁ ),...,(x _n ,y _n ) X, where x _i ∈R ^m Features representing the ith URL in D, y _i Is the corresponding class of the ith URL, the ground truth y caused by the text component _i Estimating tags

Loss between->

Indicating that the loss of structural component is +.>

Representation of->

Refers to the prediction result of the depth component of the i-th sample;

the self-progress learning strategy co-trains parameters w of bilinear factorizer model and time convolution network model and learns potential weight variables v= [ v ] by minimizing the following equation ₁ ,....,v _n ]：

Wherein the parameter lambda controls the learning rate, L _w Refers to the loss of text portions quantified by logical loss; l (L) _d Represents the deep structural loss measured by cross entropy loss.

A network attack detection device of a power grid information system based on feature fusion comprises:

the preprocessing module is used for preprocessing the data of the sample URL data, and comprises the steps of removing repeated samples, trimming the data, and formatting, wherein the data is trimmed to remove symbols and characters of specified conditions, the formatting divides the data into two columns, the trimmed URL is placed in a first column, and a label of the URL is placed in a second column, and the label marks whether the URL is malicious or not;

the text feature extraction module is used for extracting text features including vocabulary features and statistical features based on the preprocessed URL data, constructing text feature vectors, and learning potential interaction relations among the text feature vectors by using a bilinear factorizer;

The structural feature extraction module is used for executing token extraction based on the preprocessed URL data to acquire tokens from the URLs, learning vector representations of the URLs through word2vec, and learning distance dependency relations among the URLtokens vectors by using a time convolution network, wherein the distance dependency relations are called structural features;

the feature fusion module is used for cooperatively training a bilinear factor decomposition machine and a time convolution network by utilizing a self-scheduled learning strategy, after the whole model is trained, the URL data to be detected is identified by utilizing the trained model, the malicious URL detection is completed based on the identification result of feature fusion, the self-scheduled learning strategy reduces entropy value by gradually adding learning data, trains potential weight parameters, and indicates whether a sample is selected or not by introducing weight variables into a loss function.

The present invention also provides a computer device comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processors implement the steps of the grid information system network attack detection method based on feature fusion as described above.

The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the network attack detection method of a grid information system based on feature fusion as described above.

The beneficial effects are that: according to the method, the text characteristics and the structural characteristics of the URL are comprehensively considered, and potential interaction between the text characteristics is effectively learned by a bilinear factorization (FFM) algorithm aiming at the text characteristics; for deep structural features, marks at different positions in malicious URLs are considered to have different functions, position embedding is introduced to carry out mark vectorization so as to reduce ambiguity of the URL marks, and meanwhile, a Time Convolution Network (TCN) is utilized to learn long-distance dependency relationship among the URL marks, so that feature integrity is effectively improved. After the text features and the structural features are extracted, the two branches are effectively and cooperatively trained through a self-scheduled learning strategy, so that the model is ensured to be applicable to simple and diversified samples, and finally the detection of malicious URLs is effectively completed based on the fusion features. The invention provides an effective means for solving the malicious URL detection problems of characteristics of short life cycle, dynamic change along with different confusion strategies and the like in the power system.

Drawings

FIG. 1 is a schematic diagram of an overall attack detection method of the present invention;

FIG. 2 is an exploded view of an exemplary URL of the present invention;

fig. 3 is a schematic diagram of a time convolutional neural network of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.

The technical problem to be solved by the invention is to propose how to fuse heterogeneous features including text features (statistical features, vocabulary features) and deep structural features, and aims to provide an interpretable model for detecting malicious URLs, unlike the previous study mainly focusing on vocabulary features. Referring to FIG. 1, after data preprocessing, a part is the semantic processing branch that characterizes extracted text features (lexical features and statistical features) and learns potential interactions between features using a bilinear factorizer; the other part is a space processing branch which adopts a position embedding and time convolution network to learn the deep structural characteristics of the URL; and then introducing a self-scheduling learning Strategy (SPLD), so as to effectively cooperatively train the two branches and give a detection result of the malicious URL. Generally, a learning model-based processing method includes a training phase in which a model is trained using a training data set and an application phase in which actually generated data is predicted using a trained model. The specific processing of the two stages is similar, and for brevity and clarity of description, specific description is made in the following description of the training process of the method of the present invention. After training is completed, it is obvious that the application is performed using the trained model.

Referring to fig. 1, the network attack detection method of the power grid information system based on feature fusion provided by the invention comprises the following steps:

and S1, carrying out data preprocessing on the sample URL data.

The URL specifies the location of the Web resource and provides a mechanism to retrieve the corresponding Internet information. Typically a URL consists of four parts: protocols, domain names, paths, and files. The protocol portion indicates which protocol should be used to access information specified in the domain. Domain names allow users to access websites by remembering a simple set of words or other characters rather than a long string of numbers. The domain name consists of two parts: top level fields and subfields. The subdomain, together with the top-level domain, forms a fully defined domain name that can be used to access the web site. The path is a list of subdirectories from the server root directory to the file. For the file part, it usually refers to the name of the resource, sometimes also including a list of parameters.

In the embodiment of the invention, the URL data of the network attack in one-year operation is counted and synthesized through the power network detection system and is used as training data. Data preprocessing is performed on the synthesized data set, including: duplicate samples are removed, data is pruned, and formatted.

After removing duplicate samples, the data pruning for the URL dataset is largely divided into two parts: one part is data pruning for extracting URL text features, and the other part is data pruning for extracting URL structural features.

For data pruning to extract text features, for a URL dataset, characters are first selected as the smallest data processing unit, followed by character frequency statistics. Through operations such as deleting low-frequency special characters, normalizing the length of the URL and the like, each URL is ensured to provide useful information, and the complexity of the URL is effectively reduced. The character occurrence frequency after the 45 th index is found to be very low through statistics, so that the character after the index is deleted from the URL, and the influence on text characteristic information in the URL is negligible. For the problem of inconsistency in the length of each URL, the length of each URL is normalized based on the average length of URLs in the statistical dataset of the text being 45. Longer than 45 parts are truncated and shorter parts are filled with zeros to ensure that the length of each URL is the same.

For data pruning of the extracted structural features, for a large-scale URL data set, data screening is performed according to the following rules: 1) The consecutive strings after the last # are deleted because # in URL is a part of web page and is not transmitted to the server side. 2) Delete last? The following sequential string because? The cached information should not be used by the Web browser is typically represented in the URL.

Next, data formatting is performed to mark the pruned samples. Specifically, the dataset is formatted into two columns. The pruned URL is placed in the first column, the tag of the URL is placed in the second column, and the tag marks whether the URL is malicious or not. And obtaining a final URL instance data set through data preprocessing, and selecting 4/5 of the final URL instance data set as a training data set.

And S2, extracting text features, wherein the text features comprise vocabulary features and statistical features, and learning potential interactions between the text features by using a bilinear factorizer.

First, feature engineering is performed on each URL instance in the training dataset, a large number of features including lexical features and statistical features are extracted, and then feature vectors are constructed.

For feature engineering, the present invention uses vocabulary features and URL statistics to characterize text features. The vocabulary features are mainly obtained through matching of character string processing functions, and the statistical features are obtained through calculation based on an editing distance algorithm. Aiming at URL statistical characteristics, the invention constructs a token name word library to calculate the similarity of the token, and obtains the value of the statistical characteristics based on an edit distance algorithm after the token extraction processing is carried out on the URL to be detected. The specific method for token extraction is described in connection with the extraction of structural features in step S3.

Given a URL data set

First, a text feature vector x is extracted for each URL instance _i ＝{x ₁ ,...,x _p X, where x _i Is a p-dimensional feature vector; y is _i Is sample S _i Is provided with a tag information of the tag. The goal for this vector is to find a vector with y _i Mapping function of maximum likelihood estimation of +.>

The text features of the URLs described above may not be independent and extremely sparse. While facing machine-based classification algorithms, the condition-independent assumptions indicate that, given a condition variable, the features are independent of each other. The independence between features is assumed to be very strong, but it is hardly true for URL text features. Since the text features described above may not be independent and extremely sparse, the present invention applies a decomposition machine (FM) to learn the high-order potential interactions between text features.

A 2 nd order FM model can be expressed as equation (1) below, where ω ₀ Is the model bias;ω _i e R is the characteristic variable x _i Weight modeling of (2);

characterization variable x _i And x _j The paired interactions between the two are shown as a formula (2), wherein k represents the length of the hidden vector; n represents the characteristic number of the sample; v _i,f Represents x _i Is used for the auxiliary vector of (a). FM does not require independent assumptions about features and can be used to model interactions between features. Second, FM models all nested variable interactions using decomposition parameterization rather than dense parameterization as in SVM. This ensures that FM has linear time complexity in O (kn). The strict assumption of feature independence is impaired by mining pairwise interactions between features through the use of FM. For classification tasks, the FM model is trained by minimizing cross entropy loss, as shown in equation (3). Wherein p is _i Is a predictive probability distribution, y _i Refers to the predicted outcome.

Wherein->

The FFM (Field-aware Factorization Machines) refines the hidden vectors even further, and each feature in the FFM corresponds to a plurality of hidden vectors, as compared with the FM, which corresponds to only one hidden vector. Different hidden vectors are used to compute parameters for corresponding cross terms with different feature crossings. Obviously, the FFM with more hidden vectors for each feature has more parameters than the FM with only one hidden vector for each feature, so that the fitting capability of the model is stronger. The characteristic crossover of FFM is shown as formula (4), wherein +.>

Representing vector x _i In the corresponding domain f _j Auxiliary vector of>

Step S3, after data preprocessing, performing token extraction to acquire tokens from URLs, learning vector representations of the URLs through word2vec, and extracting and learning distance dependency relations among the URLtokens vectors by using a time convolution network.

URL representations are intended to build a feature map to characterize URL instances, which can be characterized from different angles. The text features of a URL mainly include the length of the URL, the frequency of separators, special characters, etc. The lexical features only focus on co-occurrence of the token, and ignores the token's location information. Text features extracted from one dataset may introduce bias when applied to other datasets, which ultimately results in bias in the performance of the training model. The invention proposes that feature extraction from the structural pattern of the URL is required in addition to text feature extraction from the text pattern of the URL. The omission of structural information of the URL may cause blurring of the token, and the nuances of the token cannot be distinguished. For two parts in a URL: the domain portion and the path portion, where the same token has different meanings in different portions, are located in the domain portion where an identifier of a particular web site, possibly indicating the hosting location of a given URL, and the path portion represents the relative path.

According to previous studies, malicious URLs are more likely to contain known domain tokens at the path level to fool potential users into clicking on links. To emphasize the impact of token locations on malicious URL detection, location embedding of URLs was introduced. To evaluate the effect of location embedding on ambiguity reduction, the present invention compares the difference of the marker vectors without location embedding. Among URLs, com, web, www, login, chase, etc. are typical token. Specifically, the first 10 most common token that appear in the domain portion and the path portion are selected first. Then, for each token, the most similar token when each token appears in the domain portion and the path portion is retrieved separately. When token com appears in the domain portion, the most similar token to com includes nocovernightclubs, elitetv, etc. When token com appears in the path section, it is most similar to mazarbhai, gatto and windows soul, etc. For token paypal, it has the highest similarity to webo, maxfocus, ttwschool and freesexwebcam in the domain section. Although it shows a high degree of association with webtvhd, iessay in the path portion. Obviously, two sets of similar token are significantly different as the location of a particular token changes. Meanwhile, the invention compares token similarity when the URLs are in different parts. the token www has low similarity of representation vectors when presented in different parts. This suggests that tokens that appear in different parts should be considered as two tokens to avoid a token ambiguity. Based on the above analysis, urlkens are location sensitive and should not be considered directly as text sequences. Further, in order to minimize token ambiguity in different contexts, the present invention uses location embedding to embed location information into token vectorization by taking into account the functionality of the URL portion.

The implementation process of the specific position embedding comprises the following steps:

after the data preprocessing, token extraction is performed to obtain a token from the URL.

First, the URL is divided into four blocks by location: protocols, domains, paths, and files. The first/previous block as a protocol part; the second/previous string is defined as a field part; the last/following character string is considered as a file portion; the remaining strings are considered path portions. To overcome the token ambiguity problem, the token is positioned on different chucks with different types of brackets using an alignment strategy.

For example, referring to FIG. 2, a URL is broken down into four blocks, each with multiple chucks inside, and an alignment policy is performed for each chuck to minimize tag ambiguity. Alignment is to locate the token on different chucks with brackets of different types so that the uniqueness of the URL token is guaranteed. Specifically, each token of the protocol portion is placed in brackets { }; and each mark in the field section is placed in brackets (), the mark in the path section is placed in brackets < >, and the mark in the file section is placed in brackets [ ].

The segmented sequential string is partitioned by a set of separators defined at a= {/, \,:, #, @,? In = }. In order to minimize information loss, the invention introduces special markers including slash, equivalent, dot, hyphen, hash, dash, at, mark, colon instead of special delimiters defined in a. As shown in fig. 2, the separator is replaced by english characters. The token within these four blocks is partitioned.

After token extraction, the generated token sequence is sent to the word2vec model, embedding the word into the low dimensional vector space.

For deep structural features, the present invention segments each URL instance A with a separator _i . After the mark is separated, the original URL A _i Can be expressed as token sequence A _i ＝[a _i :,....,:a _n ]Wherein a is ₁ Representing a token, n representing the number of tokens in the pruned URL. After token extraction, the generated token sequence is fed into the word2vec model. word2vec is a two-layer neural network widely used in natural language processing that embeds words into a low-dimensional vector space where contextually close-based vectors in the vector space have similar meanings, while word vectors that are far from each other have different meaning meanings. The idea of word2vec is followed to learn the vector representation of URL tokens to capture syntactic and semantic relationships between URL tokens. In practice, the skip-gram model is used to train a vectorized representation of URL tokens.

In view of the fact that the tags at different locations have different functional characteristics, ambiguity of the tags is reduced by location embedding of the URL representation. Unlike the method of treating a URL as text data, the location represented by the URL is embedded with a structural pattern using the URL and distinguishes tags according to its structural function. Given a model to be trained, location embedding is used to learn the vectorization of the token, and the model trained using location embedding tends to converge quickly, which is critical to model training on a large data set with lower training costs and less computational effort.

After token vectorization, the pruned URL may be denoted as A _i ＝[a _i :,....,:a _n ] ^Τ Wherein u is _i Is a vectorized representation of the ith token. For this part of the vector, the next goal is to find a near ground truth y _i Is a target function of (a).

About 25% of malicious URLs are hosted on trusted domains, as trusted domains are less likely to be suspected and more difficult to block. To address this problem, the dependencies of the token will facilitate malicious detection. The preprocessed URL can be seen as a series of text sequences, but the deep learning model cannot directly receive the original text sequence as input, but only processes the numerical tensor. Therefore, how to encode the information contained in the URL into a numerical expression is an important premise for model identification and detection. Typically by word embedding. The importance of the token position in the URL is fully considered, and the data is mainly vectorized and converted into the input data acceptable by the deep learning model through position embedding.

For the consideration of a deep learning model, CNN cannot encode time sequence information and the receptive field is small; RNNs cannot preserve long-range dependencies with huge training consumption in terms of temporal and spatial complexity. Thus, the present invention selects a temporal neural network (TCN) to learn the association between URL structures. The TCN model belongs to one of convolutional neural network models, and is essentially a one-dimensional CNN model specially modified for time sequence problems, so that TCN is utilized to learn remote dependence among URLtoken. The general framework of TCN consists of three components: causal convolution, dilation convolution, and residual connection. Causal convolution depends only on the current and past input values. Formally, the causal convolution operation is a mapping function f (x ₍₁₎ ,...,x _(t) )→y _(t) Satisfy y _(t) Dependent on x only ₍₁₎ ,...,x _(t) Not any "future" input x _(t+1) ,...,x _(t+n) . To build deeper networks, residual blocks as shown in fig. 3 are introduced in the TCN to increase network depth to increase network expressivity, with jump connections in the residual blocks helping to preserve the norms of the gradients and resulting in stable back propagation. First, TCN remedies the problem of gradient extinction and gradient explosion and does not require time to counter-propagate during training. Second, the dilation convolution in TCN causes the receptive field to grow exponentially with the depth of the network, which makes it possible to build deeper neural networks. Third, TCNs have a higher degree of parallelism than RNNs, which makes their training and deployment computationally more feasible. Furthermore, TCN is able to learn long-range dependencies from sequential inputs through the use of a hierarchy of temporal convolution filters, pooling, and upsampling.

According to the embodiment of the invention, the input layer takes token vectorized data as the input of a model, the TCN is formed by stacking a plurality of residual modules and is responsible for extracting the time sequence characteristics of a corresponding sequence. Each residual module has an input, called X, two outputs, both high-dimensional tensors, one representing the extracted features H of the module _T A residual R representing the output of the module _T . Each residual block consists of 4 one-dimensional convolutional layers Conv0, conv1, conv2, conv 3: the first convolution layer Conv0 performs preliminary processing on the input, and the output is C ₀ The method comprises the steps of carrying out a first treatment on the surface of the The input of the second convolution layer Conv1 is C ₀ After the output goes through DropOut, the output is selectively activated by using a Sigmoid function, which is called C ₁ . The input of the third convolution layer is C ₀ The output is selected to be activated by a Tanh function after being subjected to DropOut, which is called C ₂ ，C ₁ And C ₂ The input of TCN is considered, and the expansion convolution parameter d is input conv3 after element-by-element multiplication, and the output is H _T 。H _T Adding with the module input X to obtain another output R _T 。

Considering that one TCN layer may be composed of a plurality of residual modules, it is required that when a plurality of residual modules are stackedOutput R of the (k-1) th residual block _Tk-1 As input to the kth residual block; output H of all modules _T Added and activated by the ReLU function as output Y of TCN _T 。

And S4, after extracting text features and deep structure features, introducing a collaborative training strategy, and simultaneously fitting vocabulary features and deep structure features. To achieve this objective, the present invention applies self-scheduled learning with diversity to co-training of two branches to improve robustness and efficiency.

In order to integrate the vocabulary features and the structure features, the invention introduces a diversified self-scheduling learning strategy to measure the importance of different features on malicious URL detection. Given training dataset d= { (x) ₁ ,y ₁ ),...,(x _n ,y _n ) X, where x _i ∈R ^m Features representing the ith URL in D, y _i Is the corresponding class of the i-th URL. Basic facts y caused by text components _i Estimating tags

Loss between->

And (3) representing. Similarly, the loss of structural component is used +.>

A representation; wherein->

Refers to the prediction result of the depth component of the i-th sample. To fuse semantic and structural features, a self-scheduling learning strategy aims to co-train parameters w of the wide and deep models and learn potential weight variables v= [ v ] by minimizing equation (5) ₁ ,....,v _n ]. Wherein the wide model refers to a bilinear factorizer model of text, and the deep model refers to a time convolution network model.

The self-learning starts from selecting the simplest data subset, and gradually adds complex data, so that entropy values are reduced, and potential weight parameters are trained. Whether a sample is selected is indicated by introducing a weight variable in the loss function.

The parameter λ controls the learning rate in equation (5). L (L) _w Refers to the loss of text portions quantified by logical loss; l (L) _d Represents the deep structural loss measured by cross entropy loss. The goal of SPL is to minimize the weighted training loss along with a negative l1 norm regularization. The invention applies an iterative biconvex optimization strategy to solve equation (5). When v is fixed, equation (5) can be regarded as a standard supervised learning objective function, learning the parameter w with minimal loss.

For a given w, global optimum

Can be calculated by equation (6). Wherein v is _i Is a door union for controlling whether a sample should be selected in the training dataset according to its ease.

To further account for the diversity of URL samples, the trained models may deviate due to significant performance differences using different samples. For this purpose, it is further possible to use self-scheduled learning (SPLD) with diversity, expressed by equation (7).

Unlike equation (5), a regularization term is introduced in equation (7), where

Refers to the diversity term, b is the number of partitions during training. SPLD strategies can control how many simple samples are selected and also balance the diversity partitions between different samples.

In order to increase the robustness of the model to be trained, the present invention uses a self-scheduled learning strategy to balance the ease and diversity of the samples. In general, the fusion strategy can be expressed as regularization terms (4) and (6) in the loss function defined in the formula. Regularization terms may be applied to different tasks to balance simplicity and diversity of samples.

In contrast, in the formula (5), a semi-supervised learning method of a cooperative training algorithm is mainly used, but the method is poor in robustness and efficiency, and self-learning is introduced to optimize through the formula (7). In the co-training method of the invention, the two classifiers are respectively text feature bilinear FFM and structural feature-based TCN.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims

1. The network attack detection method of the power grid information system based on the feature fusion is characterized by comprising the following steps of:

based on the preprocessed URL data, performing token extraction to acquire tokens from the URLs, learning vector representations of the URLs through word2vec, and learning distance dependency relationships between the URL token vectors by using a time convolution network, wherein the distance dependency relationships are called structural features;

the self-scheduling learning strategy is used for cooperatively training a bilinear factor decomposition machine and a time convolution network, after the whole model is trained, the trained model is used for identifying URL data to be detected, malicious URL detection is completed based on the identification result of feature fusion, the self-scheduling learning strategy is used for reducing entropy value by gradually adding learning data, potential weight parameters are trained, and weight variables are introduced into a loss function to represent whether a sample is selected or not;

wherein learning the potential interaction relationship between text feature vectors using a bilinear factorizer comprises:

Representing vector x _i In the corresponding domain f _j Auxiliary vector of>

Is the vector x _j In the corresponding domain f _i Is a vector of the auxiliary vector of (a);

performing token extraction to obtain a token from a URL includes:

the per-position divides the URL into four blocks: protocols, fields, paths and files, the first/previous block being the protocol part; the second/previous string is defined as a field part; the last/following character string is considered as a file portion; the remaining strings are considered path parts, with the token being located on different chucks with brackets of different types using an alignment strategy, wherein each token of the protocol part is placed in an brackets { }, each tag in the domain part is placed in brackets (), the tag in the path part is placed in brackets < >, and the tag in the file part is placed in brackets [ ];

the learning the distance dependency relationship between URL token vectors by using the time convolution network comprises the following steps:

the time convolution network input layer takes token vectorized data as the input of a model, the time convolution network is formed by stacking a plurality of residual modules, and is responsible for extracting the time sequence characteristics of a corresponding sequence, each residual module is provided with one input, called X, two outputs are high-dimensional tensors, and one represents the characteristics H extracted by the module _T A residual R representing the output of the module _T Each residual block consists of 4 one-dimensional convolutional layers Conv0, conv1, conv2, conv 3: the first convolution layer Conv performs preliminary processing on the input, and the output is C ₀ The method comprises the steps of carrying out a first treatment on the surface of the The input of the second convolution layer Conv1 is C ₀ After the output goes through DropOut, the output is selectively activated by using a Sigmoid function, which is called C ₁ The input of the third convolution layer is C ₀ The output is selected to be activated by a Tanh function after being subjected to DropOut, which is called C ₂ ，C ₁ And C ₂ The input of TCN is considered, and the expansion convolution parameter d is input conv3 after element-by-element multiplication, and the output is H _T ，H _T Adding with the module input X to obtain another output R _T 。

2. The method of claim 1, wherein the data pruning comprises:

3. The method of claim 1, wherein the self-scheduled learning strategy comprises:

Loss between->

Indicating that the loss of structural component is +.>

Representation of->

Refers to the prediction result of the depth component of the i-th sample;

4. The method of claim 1, wherein the vectorized representation of URL keys is trained using a skip-gram model.

5. The utility model provides a network attack detection device of electric wire netting information system based on feature fusion which characterized in that includes:

the structural feature extraction module is used for executing token extraction based on the preprocessed URL data to acquire tokens from the URLs, learning vector representations of the URLs through word2vec, and learning distance dependency relations among the URL token vectors by using a time convolution network, wherein the distance dependency relations are called structural features;

the feature fusion module is used for cooperatively training a bilinear factor decomposition machine and a time convolution network by utilizing a self-scheduled learning strategy, after the whole model is trained, identifying URL data to be detected by utilizing the trained model, and completing malicious URL detection based on the identification result of feature fusion, wherein the self-scheduled learning strategy is used for reducing an entropy value by gradually adding learning data, training potential weight parameters, and indicating whether a sample is selected or not by introducing weight variables into a loss function;

Representing vector x _i In the corresponding domain f _j Auxiliary vector of>

performing token extraction to obtain a token from a URL includes:

the time convolution network input layer takes token vectorized data as the input of a model, the time convolution network is formed by stacking a plurality of residual modules, and is responsible for extracting the time sequence characteristics of a corresponding sequence, each residual module is provided with one input, called X, two outputs are high-dimensional tensors, and one represents the characteristics H extracted by the module _T A residue representing the output of the moduleDifference R _T Each residual block consists of 4 one-dimensional convolutional layers Conv0, conv1, conv2, conv 3: the first convolution layer Conv performs preliminary processing on the input, and the output is C ₀ The method comprises the steps of carrying out a first treatment on the surface of the The input of the second convolution layer Conv1 is C ₀ After the output goes through DropOut, the output is selectively activated by using a Sigmoid function, which is called C ₁ The input of the third convolution layer is C ₀ The output is selected to be activated by a Tanh function after being subjected to DropOut, which is called C ₂ ，C ₁ And C ₂ The input of TCN is considered, and the expansion convolution parameter d is input conv3 after element-by-element multiplication, and the output is H _T ，H _T Adding with the module input X to obtain another output R _T 。

6. A computer device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processor implement the steps of the feature fusion based grid information system network attack detection method of any of claims 1-4.

7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, implements the steps of the feature fusion based network attack detection method of a grid information system according to any of claims 1-4.