CN114157498B

CN114157498B - WEB high-interaction honeypot system based on artificial intelligence and attack prevention method

Info

Publication number: CN114157498B
Application number: CN202111483818.XA
Authority: CN
Inventors: 邹福泰; 郭万达; 任蕴东; 吴越; 李林森; 易平
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2022-08-16
Anticipated expiration: 2041-12-07
Also published as: CN114157498A

Abstract

The invention relates to the field of computer network security, and discloses a WEB high-interaction honeypot system based on artificial intelligence and an anti-attack method. The system of the invention collects the malicious requests of an attacker by monitoring the related ports of the WEB service, learns the response message corresponding to each malicious request by forwarding the received malicious requests to the public network and collecting the response message, finally replies the response message to the attacker, and records the content of the attack to the access log database. The method organically combines the advantages of less resource occupation and simple deployment of the low-interaction honeypot with the advantages of high-interaction honeypot camouflage degree and strong interaction capability; and the interaction performance of the honeypot system is further improved by utilizing the artificial intelligence technology.

Description

WEB high-interaction honeypot system based on artificial intelligence and attack prevention method

Technical Field

The invention relates to the field of computer network security, in particular to a WEB high-interaction honeypot system based on artificial intelligence.

Background

Honeypots (english: honeypots) are a computer term that is used to detect or defend against unauthorized operations or hacking, and are known for their principle like honeypots that trap insects. Honeypots are often disguised as valuable networks, data, computer systems, and are purposely provided with bugs to attract hacker attacks. Any attempt to honeypots is questionable, as honeypots do not in fact provide any valuable service to the network. Monitoring software can be also arranged in the honeypot for monitoring the behavior of hackers after the hackers invade the honeypot.

Reinforcement learning is an important research direction of artificial intelligence-the third major branch of unsupervised learning and unsupervised learning in machine learning technology. The main difference between reinforcement learning and the other two branches is that a reward mechanism is added, and the reinforcement learning is not limited by the supervision or structural relationship of a data set; meanwhile, the data sets of reinforcement learning are not independent, because the states of reinforcement learning have close transition and dependency relationship. That is, the reinforcement learning does not require an explicit instruction signal, and the applicable range is extended to a multitask and diversified scene.

The Q-Learning algorithm is one of classic algorithms for reinforcement Learning and is a selection algorithm of an accumulative optimal strategy realized based on a Markov decision process. The Q learning aims to iteratively update a state-action value function Q in a time sequence difference mode through a Bellman equation, so that the Q function approaches to a real value Q, and an accumulated optimal strategy is finally obtained.

Honeypots can be classified into low-interaction honeypots and high-interaction honeypots according to the interaction performance classification. Low-interaction honeypots tend to model only a particular service, a protocol, or a small portion of a real scene. The basic information of the traffic data such as the source IP address of an attacker, network request information, load information and the like is obtained by capturing and analyzing network traffic. The most important advantages of the low-interaction honeypot are small scale and small deployment difficulty, so that the low-interaction honeypot becomes a common attack data collection tool in the field of network security at present. But at the same time, the light weight of the device also brings some disadvantages which are difficult to avoid. The most critical point is that the low-interaction honeypots are limited by the scale, the camouflage degree is poor, and attackers can easily find the real identities of the honeypots. The other type of honeypot is a high-interaction honeypot. In general, a high-interaction honeypot can perform comprehensive simulation on devices and services in a real scene, especially on an operating environment, so as to better confuse an attacker with strong technical capability. These honeypots often have a long-time and multiple interactive process with an attacker, and achieve high simulation of the real system environment, so they are called high-interaction honeypots. The high-interaction honeypot has the main advantages of strong interaction capability and high simulation degree, and is difficult for an attacker to distinguish; however, the main disadvantages of the method are large deployment scale and large resource demand, which results in high deployment cost and difficulty. Another disadvantage is that if a high interaction honeypot is compromised, an attacker may attack other systems using the honeypot as a springboard. In summary, low-interaction honeypots and high-interaction honeypots each have significant advantages and disadvantages. The former has small scale and is easy to deploy and maintain, but has low simulation degree and is easy for an attacker to identify the identity of the honeypot; the latter has high simulation degree and strong interactivity, but often needs a large amount of software and hardware resources for deployment, operation and maintenance.

Meanwhile, most of the existing honeypots close the connection with an attacker immediately after recognizing the attack on the honeypot. Thus, although the risk of the honeypot system being completely attacked can be minimized, some preliminary attack data is collected, but the researchers are deprived of the opportunity to acquire further attack operations of the attackers. The invention aims to make up for the deficiencies of the two types of honeypot systems and design a high-interaction honeypot system which is light in weight, strong in interactivity and capable of inducing attackers to carry out continuous interactive attacks. The honeypot system designed by the invention can not make an instant response to an attacker in an initial state, but can forward an attack request to a public network when being attacked to acquire and store the response of actual fragile equipment to the request, and finally forward the response to the request to the attacker, so that the attacker mistakenly thinks that the actual equipment is invaded, and the attacker is attracted to carry out further attack operation. When the honeypot system is designed and realized, the interaction performance of the honeypot is improved by combining a Q-Learning algorithm which is a classic algorithm of the current popular reinforcement Learning technology, and the capability of the honeypot for selecting a response from massive response data and sending the response to an attacker is improved.

Disclosure of Invention

In view of the above drawbacks of the prior art, the technical problem to be solved by the present invention is to design a lightweight honeypot system with high interactive performance by combining the advantages of low-interaction honeypots and high-interaction honeypots and minimizing the disadvantages of the two.

In order to achieve the purpose, the invention provides a WEB high-interaction honeypot system based on artificial intelligence.

A WEB high-interaction honeypot system based on artificial intelligence comprises a request collection layer, a response acquisition layer, an artificial intelligence layer, a log recording layer, four layers of active port databases, a request-response database, an artificial intelligence database and an access log database; the request collection layer collects malicious requests of attackers by monitoring related ports of the WEB service; the response acquisition layer learns a response message corresponding to each malicious request in a mode of forwarding the received malicious request to the public network and collecting the response message, and finally replies the response message to the attacker; the log recording layer records the content of the attack to an access log database; the artificial intelligence layer combines the reinforcement learning algorithm in the artificial intelligence technology, selects a response message most expected by an attacker from a large number of response messages from the public network IP, replies the response message to the attacker, and records the details of the attack in the honeypot system log.

Further, the said

Requesting a collection layer: collecting malicious HTTP requests sent to the honeypot system and forwarding the malicious HTTP requests to a response acquisition layer;

responding to the acquisition layer: receiving a response message from the request collection layer, and searching a tuple containing the message in a request-response database. If the tuple exists, the message is directly forwarded to the artificial intelligence layer; otherwise, the request message is forwarded to the public network IP with an open port, response messages from the IP are collected and stored in a request-response database, and finally the request message is forwarded to the artificial intelligent layer.

Further, the said

Artificial intelligence layer: selecting a response message corresponding to the request message by using a reinforcement Learning algorithm-Q-Learning algorithm, and forwarding the response message to a log recording layer;

a log recording layer: and recording the request message from the attack and the response message corresponding to the honeypot system.

Further, the reinforcement learning algorithm includes:

step 108-1, the artificial intelligent layer receives the request message from the response acquisition layer, and searches and obtains all response messages from the public network corresponding to the request message from the request-response database;

108-2, randomly selecting one response message from the response messages to reply to the attacker in an initial state, then recording the next attack action of the attacker, if no next action exists, closing the recording connection, and repeating the process for a plurality of times;

step 108-3, classifying, quantifying and storing the request message, the response message and the next operation of the attacker, namely the return in the Q-Learning algorithm, into an artificial intelligence database to form a session table of the attacker and the honeypot system, namely a Q table in the Q-Learning algorithm;

step 108-4, reading the session table and operating a Q learning algorithm program part by the artificial intelligent layer to find a response message capable of maximizing the number of interactive rounds of an attacker and the honeypot system and store the message;

and step 108-5, the artificial intelligent layer replies the selected response message to the attacker.

Further, the said

Active port database: storing public network IP and open port thereof, mainly comprising two fields of port number and IP address;

further, the said

Request-response database: the storage request message and the corresponding response message mainly include fields such as a port number, a request message, a response message and the like.

Further, the said

Artificial intelligence database: the Q table generated by the artificial intelligent layer is stored, and mainly comprises fields such as a request message, a response return, a port number and the like.

Further, the said

Accessing a log database: and recording WEB attack messages to the honeypot system and response messages corresponding to the honeypot system, wherein the WEB attack messages mainly comprise fields such as system time, request messages, response messages and port numbers.

An anti-attack method based on artificial intelligence is applied to the WEB high-interaction honeypot system based on artificial intelligence, after receiving a malicious HTTP request, a response acquisition layer of the honeypot system acquires a response message by means of a public network IP, and then an artificial intelligent layer forwards the acquired response message to an attacker after screening the response message, wherein the complete interaction process of the honeypot system and the attacker comprises the following steps:

step 101, an attacker tries to attack an open port of a honeypot system, the honeypot system keeps TCP connection with the attacker, and a request collection layer temporarily stores HTTP request messages from the attacker and forwards the HTTP request messages to a response acquisition layer;

102, a response acquisition layer tries to acquire a tuple comprising the request message from a request-response database;

103, if the response acquisition layer finds the tuple, directly skipping the step 104, the step 105 and the step 106, and continuing to execute from the step 107;

step 104, if the response acquisition layer does not find the tuple containing the request message, the layer queries the public network IP opened by the corresponding port in the active port database;

step 105, the response acquisition layer forwards the request message to the port corresponding to the public network IP inquired in step 104, and waits for receiving the response message of the request message and the port within a certain time;

step 106, the response acquisition layer stores the acquired response message, the request message and the port number into a request-response database;

step 107, the response acquisition layer forwards the request message to an artificial intelligence layer;

step 108, the artificial intelligence layer screens out a response message which best meets the expectation of an attacker from the obtained response messages by using a Q-Learning algorithm, sends the response message to the attacker, and finally forwards the request message and the selected response message to the log recording layer;

and step 109, writing the interactive data of the honeypot system into the honeypot log by the log recording layer.

Further, by using a response message selection algorithm based on reinforcement learning, the training process of the response message selection function of the artificial intelligence layer comprises the following steps:

step 108-4, the artificial intelligent layer reads the session table and operates a Q learning algorithm program part to search a response message which can maximize the number of interaction rounds of an attacker and the honeypot system and store the message;

The WEB high-interaction honeypot system based on artificial intelligence achieves the effect of simulating real fragile services under the condition of less occupied network resources; meanwhile, multi-round interaction with an attacker is supported, and high interaction performance is realized; in addition, an artificial intelligence algorithm is introduced into the selection process of the response message, so that the interaction performance of the honeypot system is further improved. The system takes a malicious HTTP request as input, and obtains the response of real WEB service to the request message by storing and forwarding the request message to the public network IP; and then, combining an artificial intelligence technology, selecting a response message most expected by an attacker from a large number of response messages from the public network IP, replying the response message to the attacker, and recording the details of the attack in a honeypot system log. In a word, the method organically combines the advantages of less resource occupation and simple deployment of the low-interaction honeypot with the advantages of high-interaction honeypot camouflage degree and strong interaction capability, and minimizes the respective defects of the two types of honeypots; and the interaction performance of the honeypot system is improved to a certain extent by utilizing the artificial intelligence technology.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

FIG. 1 is a schematic block diagram of a preferred embodiment of the present invention;

FIG. 2 is a flow chart illustrating implementation of a response acquisition layer in accordance with a preferred embodiment of the present invention;

FIG. 3 is a schematic diagram of the artificial intelligence level training process of a preferred embodiment of the present invention.

Wherein: 1-request collection layer, 2-response acquisition layer, 3-artificial intelligence layer, 4-log recording layer, 5-active port database, 6-request-response database, 7-access log database and 8-artificial intelligence database.

Detailed Description

The technical contents of the preferred embodiments of the present invention will be more clearly and easily understood by referring to the drawings attached to the specification. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.

The invention is a preferred embodiment, which is composed of four databases of a request collection layer 1, a response acquisition layer 2, an artificial intelligence layer 3, a log recording layer 4, a design and active port database 5, a request-response database 6, an artificial intelligence database 8 and an access log database 7. The system of the invention collects the malicious requests of the attacker by monitoring the port related to the WEB service, learns the optimal response message corresponding to each malicious request by forwarding the received malicious requests to the public network and collecting the response message, finally replies the response message most expected by the attacker to the attacker, and records the content of the attack to the access log database.

As shown in fig. 1, a preferred embodiment of the present invention comprises:

1) request to collect level 1: collecting malicious HTTP requests sent to the honeypot system and forwarding the malicious HTTP requests to a response acquisition layer;

2) response acquisition layer 2: receiving a response message from the request collection layer, and searching a tuple containing the message in a request-response database. If the tuple exists, the message is directly forwarded to the artificial intelligence layer; otherwise, the request message is forwarded to the public network IP with an open port, response messages from the IP are collected and stored in a request-response database, and finally the request message is forwarded to the artificial intelligent layer.

3) Artificial intelligence layer 3: and selecting a response message corresponding to the request message by using a reinforcement Learning classical algorithm, namely a Q-Learning algorithm, and forwarding the response message to a log recording layer.

4) Log recording layer 4: and recording the request message from the attack and the response message corresponding to the honeypot system.

5) Active port database 5: the storage public network IP and the open port thereof mainly comprise two fields of a port number and an IP address.

6) Request-response database 6: the storage request message and the corresponding response message mainly include fields such as a port number, a request message, a response message and the like.

7) Artificial intelligence database 8: the Q table generated by the artificial intelligent layer is stored and mainly comprises fields such as request messages, response returns, port numbers and the like.

8) Access log database 7: and recording WEB attack messages to the honeypot system and response messages corresponding to the honeypot system, wherein the WEB attack messages mainly comprise fields such as system time, request messages, response messages and port numbers.

As shown in fig. 2, the interaction process of the attacker with the response acquisition layer of the honeypot system is as follows: firstly, an attacker tries to attack an open port of a honeypot system, the honeypot system keeps TCP connection with the attacker, and a request collection layer temporarily stores HTTP request messages from the attacker and forwards the HTTP request messages to a response acquisition layer. The response retrieval layer then attempts to retrieve the tuple comprising the request message from the request-response database. If the response-obtaining layer finds such a tuple, the response-obtaining layer will directly forward the message to the artificial intelligence layer for further operation. Otherwise, the response acquiring layer will inquire the open public network IP of the corresponding port in the active port database, forward the request message to the port corresponding to the public network IP inquired in step 104, and wait for and receive their response messages within a certain time. After the waiting and message collecting process is finished, the response acquisition layer stores the acquired response message, the request message and the port number into a request-response database together, and forwards the request message to the artificial intelligent layer. And then the artificial intelligent layer finishes the work of selecting response messages and replying attackers.

As shown in fig. 3, the response message selection algorithm training process of the artificial intelligence layer of the honeypot system is as follows: firstly, the artificial intelligent layer receives a request message from the response acquisition layer, and searches and obtains all response messages from the public network corresponding to the request message from the request-response database. In an initial state, the artificial intelligence layer randomly selects one response message from the response messages to reply to the attacker, then records the next attack action of the attacker (if no next action exists, the recording connection is closed), and repeats the process for a plurality of times. Meanwhile, the artificial intelligence layer classifies, quantifies and stores the request message, the response message and the next operation of the attacker (namely the return in the Q-Learning algorithm) into the artificial intelligence database to form a session table of the attacker and the honeypot system (namely the Q table in the Q-Learning algorithm). And finally, the artificial intelligent layer reads the session table and operates a Q learning algorithm program part to search for a response message which can maximize the number of interaction rounds of the attacker and the honeypot system, and stores the message. After the message is selected, the artificial intelligent layer replies the selected response message to the attacker and forwards the record of the attack to the log recording layer.

In the honeypot system, the invention designs a set of algorithms for collecting and responding malicious requests. The complete interaction process of the honeypot system and the attacker realized by the algorithm comprises the following steps:

and step 108, the artificial intelligence layer screens out a response message which best meets the expectation of the attacker from the obtained response messages by using a Q-Learning algorithm, sends the response message to the attacker, and finally forwards the request message and the selected response message to the log recording layer.

The invention designs a response message selection algorithm of an artificial intelligence layer based on a Q-Learning algorithm of reinforcement Learning. The training process of the artificial intelligence layer realized by the algorithm comprises the following steps:

108-2, randomly selecting one response message from the response messages to reply to the attacker in an initial state, then recording the next attack action of the attacker (if no next action exists, the recording connection is closed), and repeating the process for a plurality of times;

and 108-3, classifying, quantifying and storing the request message, the response message and the next operation of the attacker (namely the return in the Q-Learning algorithm) into an artificial intelligence database to form a session table (namely a Q table in the Q-Learning algorithm) of the attacker and the honeypot system.

And step 108-4, the artificial intelligent layer reads the session table and runs a Q learning algorithm program part to search a response message which can maximize the number of interaction rounds of the attacker and the honeypot system, and the message is stored.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A WEB high-interaction honeypot system based on artificial intelligence is characterized by comprising a request collection layer, a response acquisition layer, an artificial intelligence layer, a log recording layer, four layers of active port databases, a request-response database, an artificial intelligence database and an access log database; the request collection layer collects malicious requests of attackers by monitoring related ports of the WEB service; the response acquisition layer learns a response message corresponding to each malicious request in a mode of forwarding the received malicious request to the public network and collecting the response message, and finally replies the response message to the attacker; the log recording layer records the content of the attack to an access log database; the artificial intelligence layer combines a reinforcement learning algorithm in the artificial intelligence technology, selects a response message most expected by an attacker from a large number of response messages from the public network IP, replies the response message to the attacker, and records the details of the attack in a honeypot system log;

the reinforcement learning algorithm comprises:

2. The artificial intelligence based WEB high interaction honeypot system according to claim 1, wherein the honeypot system is characterized in that

a response acquisition layer: receiving a response message from a request collection layer, and searching a tuple containing the message in a request-response database; if the tuple exists, the message is directly forwarded to the artificial intelligence layer; otherwise, the request message is forwarded to the public network IP with an open port, response messages from the IP are collected and stored in a request-response database, and finally the request message is forwarded to the artificial intelligent layer.

3. The artificial intelligence based WEB high interaction honeypot system according to claim 1, wherein the honeypot system is characterized in that

4. The artificial intelligence based WEB high interaction honeypot system according to claim 1, wherein the honeypot system is characterized in that

Active port database: the storage public network IP and the open port thereof mainly comprise two fields of a port number and an IP address.

5. The artificial intelligence based WEB high interaction honeypot system according to claim 1, wherein the honeypot system is characterized in that

6. The artificial intelligence based WEB high interaction honeypot system according to claim 1, wherein the honeypot system is characterized in that

Artificial intelligence database: the Q table generated by the artificial intelligent layer is stored and mainly comprises fields such as request messages, response returns, port numbers and the like.

7. The artificial intelligence based WEB high interaction honeypot system according to claim 1, wherein the honeypot system is characterized in that

8. An artificial intelligence based attack prevention method is applied to the artificial intelligence based WEB high-interaction honeypot system according to any one of claims 1 to 7, after a malicious HTTP request is received, a response acquisition layer of the honeypot system acquires a response message by means of a public network IP, then an artificial intelligence layer transmits the acquired response message to an attacker after screening the response message, and the complete interaction process of the honeypot system and the attacker comprises the following steps:

step 107, the response acquisition layer forwards the request message to the artificial intelligent layer;

step 109, the log recording layer writes the interactive data of the honeypot system into honeypot logs;

the anti-attack method based on artificial intelligence uses a response message selection algorithm based on reinforcement learning, and the training process of the response message selection function of the artificial intelligence layer comprises the following steps: