CN115208800A

CN115208800A - Whole internet port scanning method and device based on reinforcement learning

Info

Publication number: CN115208800A
Application number: CN202211129938.4A
Authority: CN
Inventors: 杨家海; 宋光磊; 何林; 李城龙; 王之梁; 张辉
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2022-10-18
Anticipated expiration: 2042-09-16
Also published as: CN115208800B

Abstract

The invention discloses a full internet port scanning method and a device based on reinforcement learning, wherein the method comprises the steps of dividing the internet into a plurality of target networks, carrying out full port scanning on a preset number of active addresses in each target network, and constructing an open port association diagram according to port opening information obtained by scanning; recommending candidate ports of undetected active addresses in each target network according to the open port association diagram, and scanning the candidate ports to obtain port scanning feedback results; updating the expected rewards of the candidate ports based on port scanning feedback results, updating an open port association diagram based on the updated expected rewards, and predicting the next candidate port of an active address needing to be scanned in each target network according to the updated open port association diagram; and when the number of the detection ports of each target network reaches the detection number threshold value, completing the port scanning task of one target network. The invention preferentially scans the ports which are more likely to be opened so as to improve the utilization rate of detection.

Description

Whole internet port scanning method and device based on reinforcement learning

Technical Field

The invention relates to the technical field of networks, in particular to a full internet port scanning method and device based on reinforcement learning.

Background

Full-network scanning is a common research technique in various network surveys, such as measurement service deployment and security vulnerabilities. However, these network surveys are limited to a given port set, do not fully capture the true network conditions, and may even mislead the survey conclusions.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, the invention provides a whole Internet port scanning method and device based on reinforcement learning, which reduce the number of scanning ports and reduce the invasiveness of port scanning by utilizing a PMap port scanning tool. The system makes up the defects of the existing scanning tool and effectively supports the subsequent service discovery and safety research in the whole network range.

In order to achieve the above object, in one aspect, the present invention provides a full internet port scanning method based on reinforcement learning, including:

dividing the internet into a plurality of target networks, and carrying out full port scanning on a preset number of active addresses in each target network so as to construct an open port association diagram according to port open information obtained by scanning;

recommending candidate ports of undetected active addresses in each target network according to the open port association diagram, and scanning the candidate ports to obtain port scanning feedback results;

updating the expected rewards of the candidate ports based on the port scanning feedback result, updating the open port association diagram based on the updated expected rewards, and predicting the candidate port of the next active address needing to be scanned in each target network according to the updated open port association diagram; and the number of the first and second groups,

and when the number of the detection ports of each target network reaches a detection number threshold value, completing a port scanning task of the target network.

In addition, the reinforcement learning-based all-internet port scanning method according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, the pre-scanning the probe ports in each target network to construct an open port association graph according to port opening information obtained by the pre-scanning includes:

the performing full port scanning on a preset number of active addresses in each target network to construct an open port association diagram according to port opening information obtained by scanning includes:

selecting a preset number of active addresses from a target network to perform full-port scanning, and acquiring port opening information;

calculating the port opening probability of the full port based on the port opening information to obtain an initialized port opening probability;

and constructing the open port association diagram according to the initialized port open probability and a preset weight calculation formula.

Further, in an embodiment of the present invention, the recommending, according to the open port association map, a candidate port of an undetected active address in each target network includes:

when scanning the ports of undetected active addresses in a target network, selecting the port node with the highest probability as an entry node based on the open port association diagram;

judging the opening state of the port corresponding to the entry node, and updating the port opening probability corresponding to the port node with the highest probability according to the opening state judgment result; and the number of the first and second groups,

and calculating the port opening probabilities corresponding to other port nodes pointed by the port node with the highest probability according to a preset probability calculation formula to obtain the posterior probability of port opening so as to recommend the candidate ports according to the updated port opening probability.

Further, in an embodiment of the present invention, the preset number of active addresses is a seed address, and the method further includes: acquiring prior rewards of an open port i based on a pre-scanning mechanism:

where k denotes the number of seed addresses, n _i Indicating the number of open ports i in the seed address.

Further, in an embodiment of the present invention, the method further includes: the reward of scanning port i on an active address of each target network is:

after completing port scanning for an active address, opening the port

The reward of (c) is updated as:

wherein, the first and the second end of the pipe are connected with each other,

indicating a reward after n scans on port i,

a reward representing a jth scan of port i;

and updating the open port association diagram according to the updated rewards of the port i, wherein the updating process is as follows:

wherein the content of the first and second substances,

are the weights.

In order to achieve the above object, another aspect of the present invention provides an all internet port scanning apparatus based on reinforcement learning, including:

the association diagram building module is used for dividing the internet into a plurality of target networks, carrying out full port scanning on a preset number of active addresses in each target network and building an open port association diagram according to port opening information obtained by scanning;

the port scanning module is used for recommending candidate ports of undetected active addresses in each target network according to the open port association diagram and scanning the candidate ports to obtain port scanning feedback results;

the reward and graph updating module is used for updating the expected reward of the candidate port based on the port scanning feedback result, updating the open port association diagram based on the updated expected reward, and predicting the candidate port of the next active address needing to be scanned of each target network according to the updated open port association diagram; and the number of the first and second groups,

and the scanning completion module is used for completing the port scanning task of one target network when the number of the detection ports of each target network reaches the detection number threshold.

According to the method and the device for scanning the whole internet port based on reinforcement learning, the port which is more likely to be opened is preferentially scanned to improve the utilization rate of detection, the defects of the existing scanning tool are overcome, and the follow-up service discovery and safety research in the whole network range are effectively supported.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flowchart of a reinforcement learning-based full Internet port scanning method according to an embodiment of the present invention;

FIG. 2 is a flow diagram of a PMap operation according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an open port dependency graph model according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating dynamic recommendation for port scanning for an active address according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an all internet port scanning apparatus based on reinforcement learning according to an embodiment of the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The following describes a reinforcement learning-based all-internet port scanning method and apparatus according to an embodiment of the present invention with reference to the accompanying drawings.

The invention finds the correlation of the open ports according to experience, namely the similarity of the open ports in the same network and the correlation among the open ports. Fig. 2 shows the main workflow of a PMap in a network. When scanning ports on active addresses in each network, an open port association graph is constructed for each target network by scanning open ports of a few active addresses in the network in advance. Using existing knowledge (constructed dependency graph of open ports), PMap pairs in the networkThe active address that needs to be scanned recommends a candidate port (target port). Also, PMap defines the expected return for port scanning (i.e., the

) To estimate the probability of each port being open. After all candidate ports of one active address are scanned, the PMap updates the expected reward according to the port scanning feedback result and synchronously updates the open port association diagram so as to adjust the port scanning sequence of the next active address.

Fig. 1 is a flowchart of a reinforcement learning-based all internet port scanning method according to an embodiment of the present invention.

As shown in fig. 1, the method includes, but is not limited to, the following steps:

s1, dividing the Internet into a plurality of target networks, and carrying out full port scanning on a preset number of active addresses in each target network so as to construct an open port association diagram according to port opening information obtained by scanning.

Specifically, the internet is first divided into different networks, i.e., IP prefix networks promulgated by the Border Gateway Protocol (BGP). Then, port scanning is performed for active addresses within each network using PMap.

It is observed by the embodiments of the present invention that the open ports are related, mainly in the following two aspects.

It is understood that open ports within the same network have similarities. Hosts in the same network are more likely to open similar ports, e.g., 172.120.0.0/15 (EGIHgoing) networks where more than 92% of the active addresses open TCP/80 ports.

It is understood that there is an association between open ports. There is an association between the open ports and the association is unidirectional. For example, 100 ten thousand active IPv4 addresses were randomly scanned and found that if an address responded to UDP/443, they had 86% of the chance of responding to both TCP/443 and TCP/80. The active address of open TCP/443 has 89% of the chance to open TCP/80. However, the opposite correlation (TCP/80 → TCP/443 → UDP/443) is less pronounced.

The embodiment of the invention can select a small number of addresses from the network to carry out full port scanning, mine the correlation of the open ports in the network and recommend ports which are more likely to be opened on other active addresses in the network to carry out scanning according to the correlation.

Specifically, in order to depict the association relationship between different ports in the same network, the PMap constructs a directed graph G, which is called an association graph of open ports. As shown in FIG. 3, the nodes in the graph

The port number representing the corresponding open port, and all nodes in the directed graph G represent the type of open port in the network (corresponding to the similarity of open ports under the same network). Directed edges in the graph represent associations between ports (corresponding to associations between open ports), and the weights of the edges

Indicates a conditional probability (corresponding to the degree of correlation) that port j is open when port i is open. Weight of

The specific calculation method of (2) is as follows.

(1)

Wherein

Which represents the probability that port i is open,

indicating the probability that port j is open. Directed graph G contains only open ports that are present in the network (i.e., open ports are present in the network)

)。

Specifically, an open port association graph is constructed for each target network by scanning open ports of a few active addresses in the network in advance. When the target network is subjected to port scanning, a pre-scanning mechanism is used for establishing an open port association graph. More specifically, a small number of active addresses (seed addresses) are selected in the network to scan all ports, and the open probability of each node in the seed address for the corresponding port is calculated. Then, the weights between the nodes in the open port association graph are calculated according to equation 1. Due to the similarity of the open ports in the same network, the types of the open ports in the target network can be captured by scanning all the ports, and the loss of port information when a relevant graph is constructed is avoided. Preferably, when the open port correlation graph is constructed, TCP and UDP protocol attributes are added to the nodes to distinguish the protocols corresponding to the ports.

And S2, recommending candidate ports of undetected active addresses in each target network according to the open port association diagram, and scanning the candidate ports to obtain port scanning feedback results.

It can be understood that the constructed open port association map reflects the types of the open ports and the open probability thereof in the target network. In the extreme case, when all addresses in a particular network are selected as the seed addresses for pre-scanning, the open port association graph represents the true open types and their open probabilities. After the open port correlation diagram of the target network is constructed, the PMap can optimize the port scanning sequence of undetected addresses in the target network according to the correlation diagram, and preferentially select ports more likely to be open for scanning. And for each active address needing port scanning, the PMap dynamically recommends a port with the maximum opening probability in the open port association diagram by adopting a greedy method, and guides the port scanning direction. The specific recommendation process is as follows:

step 1: an ingress node for port scanning is selected. To save scan resources, the PMap preferentially scans ports that are more likely to be open. Therefore, when scanning an open port of an active address, it selects the port node i with the highest probability as the ingress node of the scanning process in G.

(2)

Where S represents the set of all nodes (type corresponding to port open).

Step 2: and updating the port opening probability. According to the correlation between the open ports, one open port brings more information of the associated port to the invention. When the port i is scanned, if the port i is open, the invention modifies the open probability of the port i in G to 1, and then updates the port open probabilities of all nodes pointed by the node i. The set of all nodes to which node i points is represented as

And taking the maximum value of the average port opening probability obtained by scanning the seed address and the posterior probability under the opening of the port i as the probability that the port is possibly opened by each node j pointed to by the node i. The port opening probability of the node j pointed to by the node i is updated according to the following formula 3.

If port i is not open, only the probability of opening of port j in G is updated to 0.

After updating the port opening probability, the PMap repeats the step 1, and selects the node with the maximum port opening probability in the G for scanning. The PMap then uses step 2 to update the port opening probability in G. This dynamic scan loops until a limit on the number of port probes per active address is reached. Fig. 4 shows in detail the dynamic recommendation process of port scanning at an active address.

S3, updating the expected rewards of the candidate ports based on port scanning feedback results, updating an open port association diagram based on the updated expected rewards, and predicting the candidate port of the next active address needing to be scanned in each target network according to the updated open port association diagram; and the number of the first and second groups,

and S4, when the number of the detection ports of each target network reaches a detection number threshold, completing a port scanning task of one target network.

Specifically, after scanning all candidate ports (actions) for an active address, the PMap updates the expected reward for the corresponding open port to more easily estimate the probability of opening a port in the target network. Therefore, it updates the association map based on the rewards of the port scan to provide more accurate port recommendations for the next active address to perform the port scan.

The PMap updates the rewards for scanning ports so that these ports with high hit rates have a better chance of being scanned when scanning the next active address. The (initial) a priori rewards of the open ports i are obtained based on a pre-scanning mechanism in the construction of the open port association graph:

where k represents the number of seed addresses,

indicating the number of open ports i in the seed address.

When scanning port i for an active address, the reward for scanning is 1 if port i is open, and 0 otherwise. The reward for scanning port i at an active address is as follows:

after completing the port scan (action) for an active address, the reward for open port i is updated as follows:

wherein

Indicates a reward after n scans on port i, and

representing the reward for the jth scan of port i.

The frequency of each open port is calculated by calculating the proportion of each open port on the scan address. Due to the correlation of the open ports within the same network, the open probability of a port can be approximated to the open frequency of the previous port. From equation 4, the present inventors have found

Also represents the probability that port i is open, i.e.:

。

due to sampling deviation of the seed address, the constructed open port association diagram may not fully characterize the open port characteristics of the whole target network. In order to solve the problem, after the PMap scans the address in the target network, a more reliable open port association map is constructed according to the updated reward. The specific update is as follows:

wherein the content of the first and second substances,

are weights.

It will be appreciated that this looping of the dynamic scan-update-adjustment process of embodiments of the present invention continues until the total number of probe packets reaches the budget limit.

Further, as the algorithm iterates, the rewards of ports with high open probabilities will become higher and higher, allocating more budget to these ports, which will eventually lead to convergence of the algorithm. However, due to sampling bias during initialization, some open ports in the target network may be missed, resulting in the absence of these port types in the constructed dependency graph of open ports. If the algorithm converges prematurely to other ports, these ports will be missed ever. This is a typical exploration and utilization dilemma, and the problem of early convergence to a local maximum is known as premature convergence. In conjunction with the port scanning feature, the present invention uses the ϵ -greedy strategy to enhance PMap exploration. The ϵ -greedy strategy can precisely control the port scan budget in advance. More specifically, the PMap was explored with probability ϵ. When the heuristic mechanism is triggered, the PMap will scan all ports for active addresses to overcome the loss of port type due to sampling bias. Otherwise, PMap explores with a probability of 1- ϵ, which recommends the port to scan according to the constructed dependency graph.

In summary, the present invention introduces PMap, which is a port scanning tool that can effectively find most open ports out of 65K ports in the entire network. And the PMap uses the correlation of the ports to construct an open port correlation diagram of each network, uses a reinforcement learning framework to update the open port correlation diagram according to a feedback result, and dynamically adjusts the port scanning sequence. Compared with the current port scanning method, PMap achieves better performance in terms of hit rate, coverage rate and invasiveness. Experiments on a real network show that the PMap can find 90% open ports only by scanning 151 ports (90% @ 151) for each active address, the number of the ports needing to be scanned is 311 times less than that of the ports needing to be scanned in full port scanning (90% @ 47K), and the number of the ports needing to be scanned is 5 times less than that of the ports needing to be scanned in common port scanning (90% @ 729). PMap reduces the number of scan ports and reduces the intrusiveness of port scanning. PMap is the first effective practice to use reinforcement learning to scan open ports. The system makes up the defects of the existing scanning tool and effectively supports the subsequent service discovery and safety research in the whole network range.

According to the reinforcement learning-based all-Internet port scanning method, the relevance of the open ports is fully utilized, and the ports which are more likely to be open are preferentially scanned to improve the detection utilization rate.

In order to implement the above embodiment, as shown in fig. 5, the embodiment further provides a reinforcement learning-based all-internet port scanning apparatus 10, where the apparatus 10 includes: the dependency graph building module 100, the port scanning module 200, the reward and graph update module 300, and the scan completion module 400.

The association graph building module 100 is configured to divide the internet into a plurality of target networks, perform full port scanning on a preset number of active addresses in each target network, and build an open port association graph according to port opening information obtained through scanning;

the port scanning module 200 is configured to recommend a candidate port of an undetected active address in each target network according to the open port association map, and scan the candidate port to obtain a port scanning feedback result;

the reward and graph updating module 300 is configured to update an expected reward of a candidate port based on a port scanning feedback result, update an open port association graph based on the updated expected reward, and predict a candidate port of a next active address to be scanned of each target network according to the updated open port association graph; and the number of the first and second groups,

a scan completion module 400, configured to complete a port scan task of one target network when the number of probe ports of each target network reaches the probe number threshold.

Further, the association graph building module 100 is further configured to:

calculating the port opening probability of all ports based on the port opening information to obtain an initialized port opening probability;

and constructing an open port association diagram according to the initialized port open probability and a preset weight calculation formula.

Further, the port scanning module 200 is further configured to:

when scanning ports of undetected active addresses in a target network, selecting a port node with the highest probability as an entry node based on an open port association diagram;

and calculating the port opening probabilities corresponding to all other port nodes pointed by the port node with the highest probability according to a preset probability calculation formula to obtain the posterior probability of port opening so as to recommend the candidate ports according to the updated port opening probability.

Further, the preset number of active addresses is seed addresses, and the reward and graph updating module 300 is further configured to obtain a priori rewards of the open port i based on a pre-scanning mechanism:

Further, the reward and map update module 300 is further configured to scan the rewards of port i at an active address of each target network as follows:

opening the port after completing the port scan of an active address

The reward of (1) is updated as:

wherein the content of the first and second substances,

indicating a reward after n scans on port i,

represents the reward for the jth scan of port i;

wherein the content of the first and second substances,

are weights.

According to the reinforcement learning-based all-Internet port scanning device disclosed by the embodiment of the invention, the relevance of the open ports is fully utilized, and the ports which are more likely to be open are preferentially scanned to improve the utilization rate of detection.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A full Internet port scanning method based on reinforcement learning is characterized by comprising the following steps:

updating the expected rewards of the candidate ports based on the port scanning feedback result, updating the open port association diagram based on the updated expected rewards, and predicting the candidate port of the next active address needing to be scanned in each target network according to the updated open port association diagram; and (c) a second step of,

2. The method of claim 1, wherein the performing full port scanning on a preset number of active addresses in each target network to construct an open port association map according to port opening information obtained by scanning comprises:

3. The method of claim 2, wherein recommending candidate ports of undetected active addresses in each target network according to the open port association map comprises:

4. The method of claim 3, wherein the predetermined number of active addresses is a seed address, and wherein the method further comprises: acquiring prior rewards of an open port i based on a pre-scanning mechanism:

5. The method of claim 4, further comprising: the reward for scanning port i on an active address of each target network is:

opening the port after completing the port scan of an active address

The reward of (1) is updated as:

indicating a reward after n scans on port i,

represents the reward for the jth scan of port i;

wherein the content of the first and second substances,

are weights.

6. An all internet port scanning device based on reinforcement learning, comprising:

and the scanning completion module is used for completing a port scanning task of one target network when the number of the detection ports of each target network reaches a detection number threshold.

7. The apparatus of claim 6, wherein the dependency graph building module is further configured to:

8. The apparatus of claim 7, wherein the port scanning module is further configured to:

judging the opening state of the port corresponding to the entry node, and updating the port opening probability corresponding to the port node with the highest probability according to the judgment result of the opening state; and (c) a second step of,

9. The apparatus of claim 8, wherein the predetermined number of active addresses are seed addresses, and wherein the reward and graph update module is further configured to obtain an a priori reward for open port i based on a pre-scan mechanism:

10. The apparatus of claim 9, wherein the reward and map update module is further configured to scan port i for rewards on an active address of each target network:

opening the port after completing the port scan of an active address

The reward of (1) is updated as:

wherein the content of the first and second substances,

indicating a reward after n scans on port i,

represents the reward for the jth scan of port i;

are the weights.