CN113806739B - Business access data detection method based on deep learning - Google Patents

Business access data detection method based on deep learning Download PDF

Info

Publication number
CN113806739B
CN113806739B CN202111084993.1A CN202111084993A CN113806739B CN 113806739 B CN113806739 B CN 113806739B CN 202111084993 A CN202111084993 A CN 202111084993A CN 113806739 B CN113806739 B CN 113806739B
Authority
CN
China
Prior art keywords
data
layer
request
formula
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111084993.1A
Other languages
Chinese (zh)
Other versions
CN113806739A (en
Inventor
田新远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huaqing Xin'an Technology Co ltd
Original Assignee
Beijing Huaqing Xin'an Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huaqing Xin'an Technology Co ltd filed Critical Beijing Huaqing Xin'an Technology Co ltd
Priority to CN202111084993.1A priority Critical patent/CN113806739B/en
Publication of CN113806739A publication Critical patent/CN113806739A/en
Application granted granted Critical
Publication of CN113806739B publication Critical patent/CN113806739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Machine Translation (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a service access data detection method based on deep learning, which comprises the following steps: vectorizing original request data respectively aiming at a request head and other parts, then inputting a vector matrix of the request head into a full-connection network model for training, and judging whether the request head is white data or not; the output of the current network layer in the fully-connected network model is the input of the next network layer, and the calculation formula of the current network layer is as follows:
Figure DDA0003265293190000011
the parameters are updated according to the formulas (II) and (III),
Figure DDA0003265293190000012
Figure DDA0003265293190000013
the method and the device detect the request head of the request data and then detect other parts of the request data, can accurately and quickly detect the white data in the service access, and have the accuracy rate of about 99 percent and the recall rate of about 97 percent.

Description

Business access data detection method based on deep learning
Technical Field
The invention relates to the technical field of network security big data. More particularly, the invention relates to a service access data detection method based on deep learning.
Background
The science and technology is the rapid development of the double-edged sword and the network technology, brings great convenience to the life of people, and simultaneously puts higher requirements on the network security technology. The clothes and food residents of people realize digitization through a network, all data can be stored in a database in a specific form by each large company, and vulnerabilities in the network are often utilized by lawless persons, and the lawless persons attack invisibly and often cause extremely serious consequences. In recent years, the research on network security models is not few, but the research is focused on learning the characteristics of malicious data, and the obtained results are not very excellent.
Disclosure of Invention
An object of the present invention is to solve at least the above problems and to provide at least the advantages described later.
The invention also aims to provide a service access data detection method based on deep learning, which detects the request head of the request data and then detects other parts of the request data, can more accurately and quickly detect the white data in service access, and has the accuracy rate of about 99 percent and the recall rate of about 97 percent.
To achieve these objects and other advantages in accordance with the purpose of the invention, there is provided a deep learning-based service access data detection method, comprising: vectorizing original request data respectively aiming at a request head and other parts, inputting a vector matrix of the request head into a full-connection network model for training, and judging whether the request head is white data or not; in the fully-connected network model, the output of the current network layer is the input of the next network layer, and the calculation formula of the current network layer is as follows:
Figure BDA0003265293170000011
in formula (I), y is the output of the current network layer; w is aiIs a weight matrix, xiThe method comprises the following steps of inputting an ith neuron, b is a bias parameter, and n is the number of the neurons, wherein n is a positive integer;
wherein each parameter is updated according to the formula (II) and the formula (III),
Figure BDA0003265293170000021
Figure BDA0003265293170000022
in the formulae (II) and (III), α is the learning rate, bi is the bias parameter of the ith neuron, bl is the bias parameter of the l-th network layer, and w is the bias parameter of the l-th network layeriIs a weight matrix of the ith neuron, and Wl is a weight matrix of the l-th layer network layer, wherein i is a positive integer from 1 to n, and l is more than or equal to 1 and less than or equal to 4. Most of the traditional network security models are used for extracting features aiming at abnormal data, but with the rapid development of network technology, the network securityThe computation amount required by the model is greatly increased, the running speed is slower and slower, and the response time to abnormal access is influenced. In the requested data, abnormal points appear in each part of the request, and if the whole request is directly input into the model, the time consumption is high, and the memory consumption is also high. The detection model in the deep learning-based service access data detection method is divided into two parts, the network structure of the first half part is simple, the calculation speed is high, the response speed to the data is high, if the request head is abnormal, the request head is directly judged to be abnormal data, if the request head is not abnormal, the feature vectors of other parts flow into the model of the second half part for judgment, and therefore the processing speed of the whole detection model to the data is improved to a certain extent. The invention subverts the traditional concept, adopts the characteristic of focusing on learning the white data and can more quickly and accurately identify the white data in the service access.
Preferably, the deep learning-based service access data detection method further includes: when the request head of the original request data is white data, inputting the vector matrix of other parts of the original request data into a convolutional neural network model for training; the convolutional neural network model consists of a convolutional layer, a pooling layer and a full-connection layer; the formula of the convolution operation is as follows:
αi=f(W·Xi~i+h-1+bj) (Ⅳ)
in the formula (IV), alphaiRepresenting a feature vector obtained by the ith convolution operation; f represents an activation function; h represents the height of the convolution kernel; w represents a weight matrix of the convolution kernel; bj represents the bias parameter of the jth convolution kernel;
through pooling operations, a final characterization is obtained: t ═ max { α ═12,...,αn-h+1}
The prediction result output by the full connection layer is shown in a formula (VI):
Figure BDA0003265293170000023
in the formula (VI), the compound represented by the formula (VI),
Figure BDA0003265293170000024
the predicted value is represented by a value of the prediction,
Figure BDA0003265293170000025
weight matrix representing fully connected layers, T represents the final eigenvector, bmRepresenting the bias parameters of the fully connected layer. The convolutional layer mainly extracts features, the pooling layer mainly reduces dimensions, overfitting is prevented, and a final result is output by the full-connection layer.
Preferably, the deep learning-based service access data detection method further includes: before vectorizing the original request data respectively aiming at the request header and other parts, cleaning and preprocessing the original request data, which specifically comprises the following steps: the method comprises the steps of carrying out conventional duplication removal, similarity duplication removal, replacement of 'nan' in data by a number 0, decoding, deletion of 'n' and 'r' in data, replacement of the number in data by 0, replacement of Chinese in data by 'Chinese' and word segmentation by a jieba word segmentation tool, and finally splicing processed fields. The invention can reduce the complexity of data to a certain extent by replacing numbers and deleting 'n' and 'r' to replace Chinese character strings, so that the characteristics of the data are more obvious, and the length of the processed data is generally reduced, thereby reducing the memory consumption. The quality of the data source directly affects the effect of the model, so each step of data processing is extremely important.
Preferably, the vectorization processing includes:
vector extraction is carried out on the 'refer' and the 'user-agent' in the request header by using a bert word vector model, wherein the dimensionality of a word is defined as 768 dimensions, and the text is converted into vectors of 528 multiplied by 768;
vectorization conversion is performed on "request _ body", "url", and "method" using word2vec word vectors, where the dimension of a word is defined as 128 dimensions, and the maximum length of each piece of data is defined as 1000. The method is based on the service access type, not only extracts the features of the conventional url, but also vectorizes and converts the request _ body, the url, the method and the refer and the user-agent in the request header into the model for feature extraction, and performs multi-feature extraction on the service type so as to improve the accuracy of data detection. The maximum length of each piece of data is defined as 1000, the maximum length, the minimum length and the average value of sample data are obtained and finally determined through experiments, if the data is too long, the matrix is sparse, the space is wasted, if the data is too short, fragments with characteristics can be cut off, and the model effect can be influenced.
Preferably, the number of network layers in the fully-connected network model is 4, wherein,
a first layer network: number of neurons 128, activation function "relu";
layer two: the number of neurons 64, the activation function activation ═ relu ";
drop is added to 0.2;
layer three: the number of neurons 64, the activation function activation ═ relu ";
layer four: the neuron number 2 and the activation function activation are "sigmoid". The network layer of the fully-connected network model is preferably set to be 4 layers, so that the detection efficiency is greatly improved on the premise of ensuring that the accuracy of the model is not lower than 93%.
Preferably, in the convolutional neural network model, the loss function adopts improved cross entropy based on two classes, and the formula is as follows:
Figure BDA0003265293170000041
in the formula (VII), the reaction mixture is,
Figure BDA0003265293170000042
the predicted value is represented, y represents the true value, l represents the loss function, and η represents the accuracy of the model. When the difference between the predicted value and the true value is calculated, the predicted value is multiplied by a coefficient eta, then the error is solved, and the size of the eta is selected according to the actual sceneSo that the loss function converges faster.
Preferably, in the convolutional neural network model, the convolutional layers are 3 layers, and the fully-connected layers are 2 layers;
first convolutional layer: the number of convolution kernels is 256, and the convolution kernel size is 3 x 3;
adding MaxPooling1D (padding ═ same');
a second convolutional layer: the number of convolution kernels is 64, and the convolution kernel size is 3 x 3;
adding MaxPooling1D (padding ═ same');
a third convolutional layer: the number of convolution kernels 32, the convolution kernel size 3 × 3;
adding MaxPooling1D (padding ═ same');
adding Flatten ();
dropout (0.3) is added;
first fully-connected layer: the number of neurons is 32;
second fully-connected layer: the number of neurons is 2. Along with the increase of the complexity of the model structure, the accuracy rate can be improved to a certain extent, but sometimes the accuracy rate is also reduced, the parameters to be calculated can rise exponentially, and the consumed time is longer.
Preferably, the deep learning-based service access data detection method further includes: and (3) optimizing the model, specifically comprising: in the training process, continuously adjusting each hyper-parameter of the model, and finally determining the hyper-parameter as follows:
the data size batch _ size of each batch of fed models is 128;
the sliding window size kernel _ size of the convolutional neural network model is 3;
the neuron drop rate dropout is 0.3;
loss="binary_crossentropy";
the gradient descent optimization algorithm selects optimizer as "adam". The final hyper-parameter is determined through experiments, so that the accuracy rate and the recall rate of the model can achieve the best result.
Preferably, the deep learning-based service access data detection method includes the following steps:
step S1, cleaning and preprocessing original request data;
step S2, vectorizing the request header and other parts respectively;
step S3, inputting the vector matrix of the request head into the full-connection network model, judging whether the request head is white data, otherwise, judging abnormal data; if yes, go to step S4;
step S4, inputting the vector matrixes of other parts into the convolutional neural network model, judging whether the other parts are white data, if so, judging that the original request data are the white data; otherwise, judging the original request data as abnormal data.
The invention at least comprises the following beneficial effects: the service access data detection method based on deep learning emphasizes the characteristics of learning white data, and can more accurately identify the white data in the service;
in the invention, except for extracting the features of the conventional url, the request _ body, url, method and the refer and user-agent in the request head are subjected to vectorization conversion and input into a corresponding model for feature extraction;
the business access data detection method based on deep learning of the invention firstly detects the request head of the request data of business access, if the request head is normal, other parts of the request data, namely the request line and the request body, are detected, if the judgment result of the stage is normal, the data is white data, and if the judgment result of the stage is abnormal, the data is abnormal data; if the request header is abnormal, the request header is directly judged to be abnormal data. Through multiple tests, the cross validation accuracy of the method can reach about 99%, the loss value is about 0.01, the recall rate can reach about 97%, the accuracy can reach about 93% in the actual environment test process, and the recall rate can reach about 93%.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
Fig. 1 is a schematic flow chart of the deep learning-based service access data detection method of the present invention.
Detailed Description
The present invention is further described in detail below with reference to the attached drawings so that those skilled in the art can implement the invention by referring to the description text.
It will be understood that terms such as "having," "including," and "comprising," as used herein, do not preclude the presence or addition of one or more other elements or groups thereof.
As shown in fig. 1, the present invention provides a deep learning-based service access data detection method, which includes the following steps:
step S1, cleaning and preprocessing original request data;
step S2, vectorizing the request header and other parts respectively;
step S3, inputting the vector matrix of the request head into the full-connection network model, judging whether the request head is white data, otherwise, judging abnormal data; if yes, go to step S4;
step S4, inputting the vector matrixes of other parts into the convolutional neural network model, judging whether the other parts are white data, if so, judging that the original request data are the white data; otherwise, judging the original request data as abnormal data. In step S1, in order to reduce the noise of the data set and prepare for vectorization, the collected original request data is subjected to the following operations:
s1-1, removing the weight by a conventional method;
s1-2, removing the weight of the similarity;
s1-3, replacing all 'nan' in the data with a number 0;
s1-4, decoding;
s1-5, deleting the 'n' and 'r' in the data;
s1-6, replacing all numbers in the data with 0;
s1-7, replacing all Chinese in the data with 'Chinese';
s1-8, performing word segmentation by using a jieba word segmentation tool;
and S1-9, splicing the processed fields.
In step S2, the vectorization processing specifically includes: vector extraction is carried out on the 'refer' and the 'user-agent' in the request header by using a bert word vector model, the dimensionality of a word is defined as 768 dimensions, and the text is converted into vectors of 528 multiplied by 768;
vectorization conversion is performed on "request _ body", "url", and "method" using a word2vec word vector model, the dimension of a word is defined as 128 dimensions, and the maximum length of each piece of data is defined as 1000. And different word vector models are respectively adopted for the request head and other parts of the request data to carry out vector extraction, so that the follow-up model analysis is more accurate.
In step S3, the number of network layers in the fully-connected network model is 4, wherein,
a first layer network: number of neurons 128, activation function "relu";
layer two: the number of neurons 64, the activation function activation ═ relu "; drop is added to 0.2;
layer three: the number of neurons 64, the activation function activation ═ relu ";
layer four: neuron number 2, activation function "sigmoid";
the output of the current network layer is the input of the next network layer, and the calculation formula of the current network layer is as follows:
Figure BDA0003265293170000071
in formula (I), y is the output of the current network layer; w is aiIs a weight matrix, xiThe method comprises the following steps of inputting an ith neuron, b is a bias parameter, and n is the number of the neurons, wherein n is a positive integer;
wherein each parameter is updated by gradient descent, each parameter is updated according to the formula (II) and the formula (III),
Figure BDA0003265293170000072
Figure BDA0003265293170000073
in the formulae (II) and (III), α is the learning rate, bi is the bias parameter of the ith neuron, bl is the bias parameter of the l-th network layer, and w is the bias parameter of the l-th network layeriIs a weight matrix of the ith neuron, and Wl is a weight matrix of the l-th layer network layer, wherein i is a positive integer from 1 to n, and l is more than or equal to 1 and less than or equal to 4. When the number of network layers is 4, the output effect of the full-connection network model is optimal. Based on the service access type and according to the vector extraction of the request data characteristics, the number of the neurons of each layer of the network layer is adopted, so that the accuracy of the full-connection network model is greatly improved.
In step S4, the word2vec word vector model is embedded in the fully-connected network model, and the vector matrix of the other part of the request data is input for training. The method specifically comprises the following steps:
first convolutional layer: the number of convolution kernels is 256, and the convolution kernel size is 3 x 3;
adding MaxPooling1D (padding ═ same');
a second convolutional layer: the number of convolution kernels is 64, and the convolution kernel size is 3 x 3;
adding MaxPooling1D (padding ═ same');
a third convolutional layer: the number of convolution kernels 32, the convolution kernel size 3 × 3;
adding MaxPooling1D (padding ═ same');
adding Flatten ();
dropout (0.3) is added;
first fully-connected layer: the number of neurons is 32;
second fully-connected layer: the number of neurons is 2.
The input of the convolution layer is a vector matrix of each sentence, each sentence is provided with n words, each word is represented by a word vector with k dimensions, the dimension of the input matrix is n x k, the width is k, and k represents the dimension of the word vector and the width of a convolution kernel; after performing convolution operation on a convolution kernel W with a height h and h words, obtaining a feature vector α i by activating a function, where a bias parameter is represented by bj, the convolution operation may be represented as:
αi=f(W·Xi~i+h-1+bj)
in the formula, alphaiRepresenting a feature vector obtained by the ith convolution operation; x represents each word in the sentence; f represents an activation function; h represents the height of the convolution kernel; w represents a weight matrix of the convolution kernel; bjA bias parameter representing a jth convolution kernel;
after multiple convolution, the vector alpha is obtained as alpha12,...,αn-h+1]And inputting the data into a pooling layer to perform maximum pooling operation:
t ═ max { α }, t denotes the feature vector, and n denotes the number of words in the sentence.
Obtaining the final characteristic vector T ═ T through three-layer convolution pooling1,t2,...,tf]Wherein f is the number of convolution kernels.
Finally, the weight matrix of the full connection layer is
Figure BDA0003265293170000081
Figure BDA0003265293170000082
Obtaining a predicted result
Figure BDA0003265293170000083
The loss function adopts a binary cross entropy based on improvement:
Figure BDA0003265293170000084
in the formula (I), the compound is shown in the specification,
Figure BDA0003265293170000085
representing the predicted value, y the true value, l the loss function, η the accuracy of the model, bmRepresenting the bias parameters of the fully connected layer.
The invention also carries out the optimization and the test of the two models; wherein the model tuning comprises: in the training process, continuously adjusting each hyper-parameter of the model, and finally determining the hyper-parameter as follows:
the data size batch _ size of each batch of fed models is 128;
CNN sliding window size kernel _ size 3;
the neuron drop rate dropout is 0.3;
loss="binary_crossentropy";
the gradient descent optimization algorithm selects optimizer as "adam".
The invention discloses a deep learning-based service access data detection method, which is used for detecting service access request data by utilizing two models. Through multiple tests, the accuracy rate of the cross validation of the white model can reach about 99%, the loss value is about 0.01, the recall rate can reach about 97%, and in the actual environment test process, the accuracy rate of the model can reach about 93% and the recall rate can reach about 93%.
While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.

Claims (8)

1. The method for detecting the service access data based on deep learning is characterized by comprising the following steps: vectorizing original request data respectively aiming at a request head and other parts, then inputting a vector matrix of the request head into a full-connection network model for training, and judging whether the request head is white data or not; the output of the current network layer in the fully-connected network model is the input of the next network layer, and the calculation formula of the current network layer is as follows:
Figure FDA0003524472330000011
in formula (I), y is the output of the current network layer; w is aiIs a weight matrix, xiThe method comprises the following steps of inputting an ith neuron, b is a bias parameter, and n is the number of the neurons, wherein n is a positive integer;
wherein each parameter is updated according to the formula (II) and the formula (III),
Figure FDA0003524472330000012
Figure FDA0003524472330000013
in the formulae (II) and (III), α is the learning rate, biIs the bias parameter for the ith neuron,
Figure FDA0003524472330000014
bias parameter of layer I network layer for the ith neuron, wiIs the weight matrix for the ith neuron,
Figure FDA0003524472330000015
a weight matrix of a layer i network layer, which is an ith neuron, f denotes an activation function,
Figure FDA0003524472330000016
to represent the derivation; wherein i is a positive integer of 1-n, and l is more than or equal to 1 and less than or equal to 4;
further comprising: when the request head is white data, inputting the vector matrix of other parts of the original request data into a convolutional neural network model for training; the convolutional neural network model consists of a convolutional layer, a pooling layer and a full-connection layer; the convolution operation formula of the convolution layer is as follows:
αi=f(W·Xi~i+h-1+bj) (Ⅳ)
in the formula (IV), alphaiRepresenting a feature vector obtained by the ith convolution operation; f represents an activation function; h represents the height of the convolution kernel; w represents a weight matrix of the convolution kernel; x represents a window at the time of the convolution operation; bjA bias parameter representing a jth convolution kernel;
through pooling operations, a final characterization is obtained: t ═ max { α ═12,...,αn-h+1}
The prediction result output by the full connection layer is shown in a formula (VI):
Figure FDA0003524472330000021
in the formula (VI), the compound represented by the formula (VI),
Figure FDA0003524472330000022
the predicted value is represented by a value of the prediction,
Figure FDA0003524472330000023
representing a weight matrix of the fully-connected layer, T representing a final eigenvector, and b representing a bias parameter of the fully-connected layer;
further comprising: inputting a vector matrix of a request head into a full-connection network model, judging whether the request head is white data or not, and if not, judging that the request head is abnormal data;
if yes, inputting the vector matrixes of other parts into the convolutional neural network model, judging whether the other parts are white data, and if yes, judging that the original request data are the white data; if not, judging that the original request data are abnormal data; the request header comprises a user-agent and a refer in the request data, and the other parts are url, request _ body and method in the request data.
2. The deep learning-based service access data detection method of claim 1, further comprising: before vectorizing the original request data respectively aiming at the request header and other parts, cleaning and preprocessing the original request data, which specifically comprises the following steps: conventional deduplication, similarity deduplication, total replacement of nan in the data by a number 0, decoding, total deletion of \ n and \ r in the data, total replacement of the number in the data by 0, total replacement of Chinese in the data by chinese, word segmentation by using a jieba word segmentation tool, and finally splicing the processed fields.
3. The deep learning-based service access data detection method of claim 1, wherein the vectorization process comprises:
vector extraction is carried out on refer and user-agent in the request header by using a bert word vector model, wherein the dimension of a word is defined as 768 dimensions, and the text is converted into vectors of 528 times 768 dimensions;
vectorization conversion is performed on request _ body, url and method using word2vec word vectors, wherein the dimension of a word is defined as 128 dimensions, and the maximum length of each piece of data is defined as 1000.
4. The deep learning-based service access data detection method according to claim 1, wherein the number of network layers in the fully-connected network model is 4, wherein,
a first layer network: number of neurons 128, activation function "relu";
layer two: the number of neurons 64, the activation function activation ═ relu ";
drop is added to 0.2;
layer three: the number of neurons 64, the activation function activation ═ relu ";
layer four: the neuron number 2 and the activation function activation are "sigmoid".
5. The deep learning-based service access data detection method according to claim 1, wherein in the convolutional neural network model, the loss function adopts improved cross entropy based on two classes, and the formula is as follows:
Figure FDA0003524472330000031
in the formula (VII), the reaction mixture is,
Figure FDA0003524472330000032
the predicted value is represented, y represents the true value, l represents the loss function, and η represents the accuracy of the model.
6. The deep learning-based service access data detection method of claim 5, wherein the convolutional layer is 3 layers, and the fully-connected layer is 2 layers; wherein the content of the first and second substances,
first convolutional layer: the number of convolution kernels is 256, and the convolution kernel size is 3 x 3;
adding MaxPooling1D (padding ═ same');
a second convolutional layer: the number of convolution kernels is 64, and the convolution kernel size is 3 x 3;
adding MaxPooling1D (padding ═ same');
a third convolutional layer: the number of convolution kernels 32, the convolution kernel size 3 × 3;
adding MaxPooling1D (padding ═ same');
adding Flatten ();
dropout (0.3) is added;
first fully-connected layer: the number of neurons is 32;
second fully-connected layer: the number of neurons is 2.
7. The deep learning-based service access data detection method of claim 2, further comprising: and (3) optimizing the model, specifically comprising: in the training process, continuously adjusting each hyper-parameter of the model, and finally determining the hyper-parameter as follows:
the data size batch _ size of each batch of fed models is 128;
the sliding window size kernel _ size of the convolutional neural network model is 3;
the neuron drop rate dropout is 0.3;
loss="binary_crossentropy";
the gradient descent optimization algorithm selects optimizer as "adam".
8. The deep learning-based service access data detection method according to claim 1, comprising the steps of:
step S1, cleaning and preprocessing original request data;
step S2, vectorizing the request header and other parts respectively;
step S3, inputting the vector matrix of the request head into the full-connection network model, judging whether the request head is white data, otherwise, judging as abnormal data; if yes, go to step S4;
step S4, inputting the vector matrixes of other parts into the convolutional neural network model, judging whether the other parts are white data, if so, judging that the original request data are the white data; otherwise, judging the original request data as abnormal data.
CN202111084993.1A 2021-09-16 2021-09-16 Business access data detection method based on deep learning Active CN113806739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111084993.1A CN113806739B (en) 2021-09-16 2021-09-16 Business access data detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111084993.1A CN113806739B (en) 2021-09-16 2021-09-16 Business access data detection method based on deep learning

Publications (2)

Publication Number Publication Date
CN113806739A CN113806739A (en) 2021-12-17
CN113806739B true CN113806739B (en) 2022-04-19

Family

ID=78895529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111084993.1A Active CN113806739B (en) 2021-09-16 2021-09-16 Business access data detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN113806739B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107682216A (en) * 2017-09-01 2018-02-09 南京南瑞集团公司 A kind of network traffics protocol recognition method based on deep learning
CN109391700A (en) * 2018-12-12 2019-02-26 北京华清信安科技有限公司 Internet of Things safe cloud platform based on depth traffic aware
CN109684911A (en) * 2018-10-30 2019-04-26 百度在线网络技术(北京)有限公司 Expression recognition method, device, electronic equipment and storage medium
CN110348014A (en) * 2019-07-10 2019-10-18 电子科技大学 A kind of semantic similarity calculation method based on deep learning
CN110929118A (en) * 2019-11-04 2020-03-27 腾讯科技(深圳)有限公司 Network data processing method, equipment, device and medium
CN113032777A (en) * 2021-02-26 2021-06-25 济南浪潮高新科技投资发展有限公司 Web malicious request detection method and equipment
CN113098887A (en) * 2021-04-14 2021-07-09 西安工业大学 Phishing website detection method based on website joint characteristics

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106296692A (en) * 2016-08-11 2017-01-04 深圳市未来媒体技术研究院 Image significance detection method based on antagonism network
US11023295B2 (en) * 2019-10-25 2021-06-01 Accenture Global Solutions Limited Utilizing a neural network model to determine risk associated with an application programming interface of a web application

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107682216A (en) * 2017-09-01 2018-02-09 南京南瑞集团公司 A kind of network traffics protocol recognition method based on deep learning
CN109684911A (en) * 2018-10-30 2019-04-26 百度在线网络技术(北京)有限公司 Expression recognition method, device, electronic equipment and storage medium
CN109391700A (en) * 2018-12-12 2019-02-26 北京华清信安科技有限公司 Internet of Things safe cloud platform based on depth traffic aware
CN110348014A (en) * 2019-07-10 2019-10-18 电子科技大学 A kind of semantic similarity calculation method based on deep learning
CN110929118A (en) * 2019-11-04 2020-03-27 腾讯科技(深圳)有限公司 Network data processing method, equipment, device and medium
CN113032777A (en) * 2021-02-26 2021-06-25 济南浪潮高新科技投资发展有限公司 Web malicious request detection method and equipment
CN113098887A (en) * 2021-04-14 2021-07-09 西安工业大学 Phishing website detection method based on website joint characteristics

Also Published As

Publication number Publication date
CN113806739A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN111126386B (en) Sequence domain adaptation method based on countermeasure learning in scene text recognition
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN109241530B (en) Chinese text multi-classification method based on N-gram vector and convolutional neural network
CN108170736B (en) Document rapid scanning qualitative method based on cyclic attention mechanism
CN110825877A (en) Semantic similarity analysis method based on text clustering
WO2022126810A1 (en) Text clustering method
CN112732921B (en) False user comment detection method and system
CN111984791B (en) Attention mechanism-based long text classification method
CN110377605B (en) Sensitive attribute identification and classification method for structured data
CN110097096B (en) Text classification method based on TF-IDF matrix and capsule network
CN110543564A (en) Method for acquiring domain label based on topic model
CN112529638B (en) Service demand dynamic prediction method and system based on user classification and deep learning
CN110414587A (en) Depth convolutional neural networks training method and system based on progressive learning
CN116756303A (en) Automatic generation method and system for multi-topic text abstract
CN111831822A (en) Text multi-classification method for unbalanced data set based on text multi-classification mixed equipartition clustering sampling algorithm
CN108388918B (en) Data feature selection method with structure retention characteristics
CN113177578A (en) Agricultural product quality classification method based on LSTM
CN113806739B (en) Business access data detection method based on deep learning
CN112347247A (en) Specific category text title binary classification method based on LDA and Bert
CN117315534A (en) Short video classification method based on VGG-16 and whale optimization algorithm
CN111460817A (en) Method and system for recommending criminal legal document related law provision
Huang et al. An empirical study on the classification of Chinese news articles by machine learning and deep learning techniques
CN112989052B (en) Chinese news long text classification method based on combination-convolution neural network
CN111950717B (en) Public opinion quantification method based on neural network
CN114881172A (en) Software vulnerability automatic classification method based on weighted word vector and neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant