CN115238799A

CN115238799A - AI-based random forest malicious traffic detection method and system

Info

Publication number: CN115238799A
Application number: CN202210892613.5A
Authority: CN
Inventors: 胡文波; 齐帅; 范传庆
Original assignee: Tianjin Guorui Digital Safety System Co ltd
Current assignee: Tianjin Guorui Digital Safety System Co ltd
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-10-25

Abstract

The invention provides a method and a system for detecting malicious forest flow based on AI random, which obtains discretized data flow by dimensionality reduction sampling data flow, thereby not only reducing the subsequent required operation speed, but also greatly reducing the operation amount; by calling the syntactic model and the semantic analysis model, sentence break and redundant filtering of the data stream can be automatically completed to obtain a feature vector matrix, so that artificial intelligence and automation of feature extraction are realized; through the convolutional neural network and the random forest classification, the required characteristic vectors can be further highlighted, and the classification function with different classification integration capabilities is realized, so that the problems that the attack changing every moment is difficult to detect and the calculation amount is huge in the prior art are solved.

Description

AI-based random forest malicious traffic detection method and system

Technical Field

The application relates to the technical field of network security, in particular to a method and a system for detecting malicious forest flow based on AI (artificial intelligence).

Background

With the rapid development of networks, the networks gradually enter people's daily life, but malicious codes also develop greatly, the network security problem becomes more and more prominent, and the situation of an industrial chain is presented. At present, the known malicious code attacks relatively less threatens, but potential malicious codes bring huge destructive power, and the means and the attack forms of the malicious codes are changed all the time and are difficult to detect.

Meanwhile, the template of the sample data of the malicious codes is getting bigger and bigger, and the training of a huge data set to be completed by machine learning also becomes a technical problem, and an improvement measure needs to be provided to accelerate the progress of the machine learning.

Therefore, a targeted AI random forest-based malicious traffic detection method and system are urgently needed.

Disclosure of Invention

The invention aims to provide a method and a system for detecting malicious forest flow based on AI (artificial intelligence) random, which solve the problems that the prior art is difficult to detect attacks with means and form changing constantly, and improve the detection method to accelerate the speed of model identification.

In a first aspect, the present application provides a method for detecting malicious traffic based on AI random forest, where the method includes:

receiving a data stream sent by an acquisition terminal, extracting the field content of a message header from the data stream, identifying different clients, and generating an independent identifier for each client;

discretizing the data stream, and sampling the data stream according to time domain continuity to obtain a discrete data stream after dimensionality reduction;

respectively establishing different containers according to the identifiers, wherein the containers are used for storing the characteristic vectors corresponding to different clients;

acquiring the discrete data stream, calling a syntactic model of the server, breaking sentences, automatically querying a dictionary to obtain a first word component, and storing the first word component into a container corresponding to the identifier to which the first word component belongs;

extracting the first word component from the corresponding container according to the identifier corresponding to the required client, inputting the first word component into the semantic analysis model of the server one by one, and receiving the returned word meaning corresponding to the first word component;

filtering redundant information from the word meaning according to a first rule to obtain a second word component corresponding to the filtered word meaning, and forming a first word component matrix;

inputting the first word component matrix into an input layer of a recognition model, and calculating standard deviations of different parts of speech, wherein the standard deviations are used for determining the width of a sliding window of a subsequent convolutional layer; the identification model is a model architecture based on a random forest and a convolutional neural network;

the output of the input layer is sent into a convolutional layer of the recognition model, local word components in the text are selected by utilizing sliding windows with different sizes, a second word component matrix is obtained by splicing the local word components, and the second word component matrix is sent into a pooling layer of the recognition model;

the pooling layer selects characteristic values for distinguishing the word meanings effectively by selecting a pooling function, and a third word component matrix is obtained by splicing again;

the processed third word component matrix is transmitted to a random forest of the recognition model for classification, the random forest performs n rounds of extraction on the third word component matrix to obtain n training sets, the extracted n training sets are used for training by column sampling randomly through the specified quantity characteristic values to obtain n decision trees, and the n decision trees obtain classification results in a voting mode;

and judging whether the data stream sent by the acquisition terminal comprises an attack vector or not according to the classification result, if so, blocking the data stream, and otherwise, allowing the data stream.

In a second aspect, the present application provides a malicious traffic detection system based on AI random forest, the system includes:

the system comprises a preprocessing module, a message sending module and a message sending module, wherein the preprocessing module is used for receiving a data stream sent by an acquisition terminal, extracting the field content of a message header from the data stream, identifying different clients and generating a separate identifier for each client; discretizing the data stream, and sampling the data stream according to time domain continuity to obtain a discrete data stream after dimensionality reduction;

the container module is used for respectively establishing different containers according to the identifiers and storing the feature vectors corresponding to different clients;

the AI module is used for acquiring the discrete data stream, calling a syntactic model of the server, carrying out sentence breaking, automatically querying a dictionary to obtain a first word component, and storing the first word component into a container corresponding to the identifier to which the first word component belongs; extracting the first word component from the corresponding container according to the identifier corresponding to the required client, inputting the first word component into the semantic analysis model of the server one by one, and receiving the returned word meaning corresponding to the first word component; filtering redundant information from the word meaning according to a first rule to obtain a second word component corresponding to the filtered word meaning, and forming a first word component matrix;

the recognition module comprises a recognition model, the recognition model is a model framework based on a random forest and a convolutional neural network, and is used for receiving the first word component matrix output by the AI module, inputting the first word component matrix into an input layer of the recognition model, and calculating standard deviations of different parts of speech, wherein the standard deviations are used for determining the width of a sliding window of a subsequent convolutional layer; the output of the input layer is sent into a convolutional layer of the recognition model, local word components in the text are selected by utilizing sliding windows with different sizes, a second word component matrix is obtained by splicing the local word components, and the second word component matrix is sent into a pooling layer of the recognition model; the pooling layer selects characteristic values for distinguishing the word meanings through selecting a pooling function, and a third word component matrix is obtained through splicing again;

and the execution module is used for judging whether the data stream sent by the acquisition terminal comprises an attack vector according to the classification result, blocking the data stream if the data stream comprises the attack vector, and allowing the data stream if the data stream comprises the attack vector.

In a third aspect, the present application provides a system for detecting malicious traffic based on AI random forest, where the system includes a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

the processor is configured to perform the method of any one of the four possibilities of the first aspect according to instructions in the program code.

In a fourth aspect, the present application provides a computer-readable storage medium for storing program code for performing the method of any one of the four possibilities of the first aspect.

Advantageous effects

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of an AI-based random forest malicious traffic detection method according to the present invention;

fig. 2 is an architecture diagram of an AI random forest malicious traffic detection system according to the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, and the scope of the present invention will be more clearly and clearly defined.

Fig. 1 is a general flowchart of an AI random forest-based malicious traffic detection method provided in the present application, where the method includes:

respectively establishing different containers according to the identifiers, wherein the containers are used for storing the feature vectors corresponding to different clients;

the larger the standard deviation, the more significant the part of speech is in recognition (e.g., attack).

and transmitting the processed third word component matrix to a random forest of the recognition model for classification, distinguishing the feature matrix by a decision tree by the random forest, classifying the feature matrix into different root directories, reasoning and judging according to knowledge and experience provided by experts, and simulating the decision process of human experts.

The random forest carries out n rounds of extraction on the third word component matrix to obtain n training sets, the extracted n training sets are used for training by column sampling and designated quantity characteristic values at random to obtain n decision trees, and the n decision trees obtain classification results in a voting mode;

and judging whether the data stream sent by the acquisition terminal comprises an attack vector according to the classification result, if so, blocking the data stream, and otherwise, allowing the data stream.

In some preferred embodiments, the recognition model is trained, and the entropy loss function is minimized through a reverse propagation manner, so as to avoid supersaturation, and when the accuracy of the recognition model meets the requirement of a threshold, the recognition model is trained completely. And then can be used for data verification.

In some preferred embodiments, the classification capability of each decision tree is targeted, the specified quantity eigenvalue is obtained according to different classifications, and the same eigenvector matrix is classified according to different angles through the decision trees, so that an integration function for different classification capabilities is completed. The classification performance is higher than that of a single classifier.

The average generalization error of a decision tree in a random forest is related to the regression function.

In some preferred embodiments, the voting mode includes performing weighted accumulation on the output result of each decision tree.

Fig. 2 is an architecture diagram of an AI random forest-based malicious traffic detection system provided in the present application, where the system includes:

the system comprises a preprocessing module, a message sending module, a message receiving module, a message sending module and a message sending module, wherein the preprocessing module is used for receiving a data stream sent by an acquisition terminal, extracting the field content of the header of a message from the data stream, identifying different clients and generating a separate identifier for each client; discretizing the data stream, and sampling the data stream according to time domain continuity to obtain a discrete data stream after dimensionality reduction;

and the execution module is used for judging whether the data stream sent by the acquisition terminal comprises an attack vector according to the classification result, blocking the data stream if the data stream comprises the attack vector, and allowing the data stream if the data stream does not comprise the attack vector.

The application provides a malicious flow detection system in random forest based on AI, the system includes: the system includes a processor and a memory:

the processor is configured to perform the method according to any of the embodiments of the first aspect according to instructions in the program code.

The present application provides a computer readable storage medium for storing program code for performing the method of any one of the embodiments of the first aspect.

In specific implementation, the present invention further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments of the present invention when executed. The storage medium can be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The same and similar parts in the various embodiments of the present specification may be referred to each other. In particular, for the embodiments, since they are substantially similar to the method embodiments, the description is simple, and the relevant points can be referred to the description in the method embodiments.

The above-described embodiments of the present invention do not limit the scope of the present invention.

Claims

1. A malicious traffic detection method based on AI random forests is characterized by comprising the following steps:

filtering redundant information from the word meaning according to a first rule to obtain second word components corresponding to the filtered word meaning, and forming a first word component matrix;

2. The method of claim 1, wherein: when the recognition model is trained, the entropy loss function is minimized through a reverse propagation mode, supersaturation is avoided, and when the precision of the recognition model meets the requirement of a threshold value, the recognition model is trained.

3. The method of claim 1, wherein: the classification capability of each decision tree has pertinence, the specified quantity characteristic value is obtained according to different classifications, and the same characteristic vector matrix is classified according to different angles through the decision trees, so that the integration function aiming at different classification capabilities is completed.

4. A method according to any of claims 2 or 3, characterized in that: the voting mode comprises the step of performing weighted accumulation on the output result of each decision tree.

5. An AI-based random forest malicious traffic detection system, the system comprising:

6. An AI-based random forest malicious traffic detection system, the system comprising a processor and a memory:

the processor is configured to perform the method according to instructions in the program code to implement any of claims 1-4.

7. A computer-readable storage medium, characterized in that the computer-readable storage medium is configured to store a program code for performing implementing the method of any of claims 1-4.