CN117972568A - Host port activity prediction method, device, equipment and storage medium - Google Patents

Host port activity prediction method, device, equipment and storage medium Download PDF

Info

Publication number
CN117972568A
CN117972568A CN202410385477.XA CN202410385477A CN117972568A CN 117972568 A CN117972568 A CN 117972568A CN 202410385477 A CN202410385477 A CN 202410385477A CN 117972568 A CN117972568 A CN 117972568A
Authority
CN
China
Prior art keywords
port
host
target
probability
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410385477.XA
Other languages
Chinese (zh)
Inventor
杨家海
李城龙
罗一睿
董恩焕
张辉
胡海娜
权晓文
王之梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202410385477.XA priority Critical patent/CN117972568A/en
Publication of CN117972568A publication Critical patent/CN117972568A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a host port activity prediction method, a device, equipment and a storage medium, and relates to the technical field of network space mapping, wherein the method comprises the following steps: traversing a host to be detected to obtain at least two host characteristics corresponding to a target host; inputting the host characteristics into classifiers of the same type as the target types in the decision model based on the target types of the host characteristics, and outputting first prediction return values corresponding to all ports in the target host; all classifiers in the decision model are constructed based on naive Bayes algorithm; determining a target prediction return value corresponding to each port based on the first prediction return value with the same port for each port; the target prediction return value is used for representing the opening probability corresponding to the port under the condition that at least two host characteristics exist; and determining the active port of the target host based on the target prediction return values corresponding to all the ports. The method and the system can improve the prediction coverage rate, universality and prediction efficiency of the activity of the whole network port.

Description

Host port activity prediction method, device, equipment and storage medium
Technical Field
The present invention relates to the field of network space mapping technologies, and in particular, to a method, an apparatus, a device, and a storage medium for predicting activity of a host port.
Background
In network space mapping, in order to solve the distribution situation of internet assets, the activity of ports on the whole network host needs to be comprehensively known, and the port is used as technical data for knowing a part of the whole network situation and subsequent mapping work.
In the scanning process, because the volume of the network space is very huge, only in the IPv4 space, the scanning object formed by combining the host and the ports at least comprises 2 32 ×65536, the currently widely used ZMap scanner is used to scan the port activity of the whole network with a bandwidth of 1Gbps, and several decades of time are required to complete a single whole-network traversal scanning, and only a part of the scanning of the ports with representativeness may cause degradation and deviation of the research result. Therefore, in the prior art, full-port scanning is generally completed by predicting the port activity on the host, but the coverage rate of port activity prediction is lower, and the application range of the method is narrower.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for predicting the activity of a host port, which are used for solving the defects of lower coverage rate and narrower application range of port activity prediction in the prior art and improving the prediction coverage rate, universality and prediction efficiency of the activity of the whole network port.
The invention provides a host port activity prediction method, which comprises the following steps:
Traversing a host to be detected to obtain at least two host characteristics corresponding to a target host;
Inputting each host characteristic into a classifier of the same type as the target type in a decision model based on the target type of each host characteristic, and outputting first prediction return values corresponding to all ports in the target host; all classifiers in the decision model are constructed based on a naive Bayesian algorithm;
For each port, determining a target predicted return value corresponding to the port based on a first predicted return value with the same port; the target prediction return value is used for representing the opening probability corresponding to the port under the condition that the at least two host characteristics exist;
And determining the active port of the target host based on the target prediction return value corresponding to each port.
According to the host port activity prediction method provided by the invention, before at least two host characteristics corresponding to a target host are acquired, the method further comprises:
Acquiring a full-port scanning result and at least two sample host characteristics corresponding to a sample host; the sample host belongs to the host to be tested;
For each sample port in the sample host, determining a first probability that each sample port belongs to an inactive port and a second probability that each sample port belongs to an active port based on the full port scan result, and determining a third probability that each sample host feature exists if each sample port belongs to an inactive port and a fourth probability that each sample host feature exists if each sample port belongs to an active port;
Initializing the decision model based on the first probability, the second probability, each of the third probabilities, and each of the fourth probabilities.
According to the method for predicting the activity of the host port provided by the invention, based on the target type of each host feature, each host feature is input into a classifier of the same type as the target type in a decision model, and a first prediction return value corresponding to each port in the target host is output, including:
For each port in the target host, under the condition that the target type of each host feature is a local feature, inputting each host feature into a local classifier in the decision model, determining a first prediction return value corresponding to the port based on the first probability and the second probability corresponding to the port and the third probability and the fourth probability corresponding to each of all host features input into the local classifier, and outputting the first prediction return value corresponding to each of all ports in the target host;
Under the condition that the target type of each host feature is a global feature, inputting each host feature into a global classifier in the decision model, determining a first prediction return value corresponding to the port based on the first probability and the second probability corresponding to the port and the third probability and the fourth probability corresponding to all host features in the global classifier, and outputting the first prediction return values corresponding to all ports in the target host.
According to the method for predicting the activity of the host port provided by the invention, the determining the active port of the target host based on the target prediction return values corresponding to all the ports respectively comprises the following steps:
Acquiring target scanning times corresponding to each port;
determining a predicted value corresponding to each port based on the target scanning times corresponding to each port and the target predicted return value;
And arranging all the predicted values in a descending order, and determining at least one port corresponding to the maximum predicted value as an active port of the target host.
According to the method for predicting the activity of the host port provided by the invention, the method for determining the predicted value corresponding to each port based on the target scanning times corresponding to each port and the target predicted return value comprises the following steps:
for each port, acquiring the total scanning times of scanning all ports on the host to be tested;
determining target uncertainty corresponding to the port based on the total scanning times and target scanning times corresponding to the port;
and determining a predicted value corresponding to the port based on the target predicted return value and the target uncertainty.
According to the host port activity prediction method provided by the invention, the method further comprises the following steps:
inputting the active port into a scanner for scanning;
updating the target scanning times corresponding to the active port, and updating the target uncertainty corresponding to the active port based on the target scanning times;
updating the first probability, the second probability, the third probability, and the fourth probability based on the scan results of the active ports to update the decision model.
According to the host port activity prediction method provided by the invention, the method further comprises the following steps:
and calling at least two decision models by utilizing multiple processes, respectively determining active ports in different target hosts in the host to be tested, wherein each decision model adopts sparse matrix representation.
The invention also provides a host port activity prediction device, which comprises:
The acquisition module is used for traversing the host to be detected and acquiring at least two host characteristics corresponding to the target host;
The prediction module is used for inputting each host characteristic into a classifier of the same type as the target type in the decision model based on the target type of each host characteristic, and outputting first prediction return values corresponding to all ports in the target host; all classifiers in the decision model are constructed based on a naive Bayesian algorithm;
the first determining module is used for determining a target prediction return value corresponding to each port based on the first prediction return value with the same port; the target prediction return value is used for representing the opening probability corresponding to the port under the condition that the at least two host characteristics exist;
And the second determining module is used for determining the active port of the target host based on the target prediction return value corresponding to each port.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize any one of the host port activity prediction methods.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a host port activity prediction method as described in any of the above.
According to the host port activity prediction method, the device, the equipment and the storage medium, after at least two host characteristics corresponding to a target host are obtained, according to the target type of each host characteristic, the host characteristics are input into a classifier constructed by a naive Bayesian algorithm in a decision model, first prediction return values corresponding to ports are output, and according to the first prediction return values corresponding to the same ports, the target prediction return values corresponding to the ports are determined, so that the opening probability corresponding to the ports under the condition that at least two host characteristics exist is determined, before all ports of the target host are scanned, the inactive ports are eliminated according to the target prediction return values, the active ports in the target host are determined, so that a scanning task of a full-network host port with high coverage rate is completed by using a small amount of scanning resources, and on the basis of ensuring prediction accuracy, prediction efficiency and prediction universality based on a naive Bayesian algorithm, the scanning task is extended from a common port to all ports, and the prediction coverage rate of the full-network port activity is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for predicting activity of a host port according to an embodiment of the present invention;
FIG. 2 is a second flowchart of a method for predicting activity of a host port according to an embodiment of the present invention;
FIG. 3 is a third flowchart illustrating a method for predicting activity of a host port according to an embodiment of the present invention;
Fig. 4 is a schematic structural diagram of a host port activity prediction apparatus according to an embodiment of the present invention;
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The whole network host port scanning is to handshake all ports on all active hosts in the whole network space by using a TCP (Transmission Control Protocol ) packet or a UDP (User Datagram Protocol, user datagram protocol) packet, and judge the activity of the port by observing the handshake result, namely, whether to respond to an external connection request, if so, the port belongs to an active port, and if not, the port belongs to an inactive port. The purpose of the full network host port scan is to find all active ports in the full network. However, most ports on the host are inactive ports, and scanning the inactive ports wastes a large amount of scanning resources, so in the prior art, a mode of predicting the activity of the ports on the host is adopted, the ports with the highest probability of being active ports are preselected for scanning before the scanning starts, the inactive ports are eliminated, and therefore, a small amount of scanning resources are used for completing the scanning task of the full-network host ports with high coverage rate.
In the prior art, the method for predicting the port activity on the host at least comprises the following two modes:
1) The purpose of port scanning includes determining active services or detecting the activity of hosts, and according to the OSI (Open System Interconnect, open systems interconnection) seven-layer model, active services must run on active ports, while the overhead of port scanning is much less than service scanning. Thus, a port scan may be performed prior to a service scan to exclude invalid service scan targets, i.e., inactive ports. In addition, IANA (THE INTERNET ASSIGNED Numbers Authority, internet digital Allocation Authority) assigns a default port number on the host for each class of service in order to normalize port usage on the host. Therefore, when the purpose of port scanning is to determine active services, only default ports allocated to services to be scanned are generally scanned, and subsequent service scanning is only performed on active ports found in the port scanning, but all services not running on the default ports are omitted, so that a scanning result is lost. When the purpose of port scanning is to detect the activity of the host, the activity of the host is generally determined by combining the ICMP packet and the scanning result of scanning the ports in the common port list, but the method cannot be applied to the host without opening the common port, and has a large limitation in application scope.
2) Based on the machine learning method, an independent XGBoost classifier is trained for 20 common ports respectively, and the XGBoost classifier of each port predicts whether the corresponding port is an active port on the host by using the geographic position of the target host, the autonomous domain and the known active port number on the autonomous domain as input data. However, the method is only suitable for common ports, and for very common ports, a large amount of training data cannot be provided for training of the classifier.
Aiming at the problems of lower coverage rate and narrower application range of the port activity prediction method in the prior art, an embodiment of the present invention provides a host port activity prediction method, and fig. 1 is one of flow diagrams of the host port activity prediction method provided in the embodiment of the present invention, as shown in fig. 1, the method includes:
step 110, traversing the host to be tested, and obtaining at least two host characteristics corresponding to the target host.
Alternatively, the host to be tested may be understood as all hosts that need to be scanned in the network space, and the target host may be at least one host that does not perform port scanning in the hosts to be tested. All hosts which are not subjected to port in the hosts to be tested can be numbered, all target hosts are traversed in sequence according to the numbers, and after the current target host is subjected to port scanning, the number is +1, and the next target host is continuously acquired.
Optionally, at least two host features corresponding to the target host are features associated with a host address of the target host, where the at least two host features may include: the address is located in a subnet, an ASN (Autonomous System Number, autonomous domain number), an active port, and an organization name, where the active port is a port number of an active port known on the target host, which is not limited in this embodiment of the present invention.
In addition, before acquiring at least two host features corresponding to the target host, the method further includes:
Acquiring a full-port scanning result and at least two sample host characteristics corresponding to a sample host; the sample host belongs to the host to be tested;
For each sample port in the sample host, determining a first probability that each sample port belongs to an inactive port and a second probability that each sample port belongs to an active port based on the full port scan result, and determining a third probability that each sample host feature exists if each sample port belongs to an inactive port and a fourth probability that each sample host feature exists if each sample port belongs to an active port;
Initializing the decision model based on the first probability, the second probability, each of the third probabilities, and each of the fourth probabilities.
Specifically, before determining the target host and at least two host features corresponding to the target host, random sampling is required to be performed on the host to be tested, at least two sample host features corresponding to the sample host and the sample host are obtained, the sample host is input into a scanner, all ports of the sample host are scanned, a full-port scanning result corresponding to the sample host is obtained, the full-port scanning result comprises an activity scanning result corresponding to each port in the sample host, that is, whether each port in the sample host is an active port can be judged according to the full-port scanning result.
After determining the full port scanning result, according to the full port scanning result corresponding to the sample host, calculating to obtain a first probability that the sample port belongs to an inactive port without considering the characteristics of the sample host for each sample portAnd a second probability/>, that the sample port belongs to an active portMeanwhile, the sample host characteristic/>, which exists under the condition that the sample port belongs to the inactive port, is calculated respectivelyThird probability/>And sample host characteristics/>, if the sample port belongs to an active portFourth probability/>. Where c represents a class, c=0 represents a class as an inactive port, c=1 represents a class as an active port, and other symbols may be used to represent different classes, for example, c= -1 represents a class as an inactive port, and c=1 represents a class as an active port, which is not limited in this embodiment of the present invention. /(I)Representing the ith sample host feature,/>N represents the sum of the number of sample host features.
After the first probability, the second probability, the third probability and the fourth probability are obtained through calculation, initializing the decision model by using the four types of probabilities, namely, taking the four types of probabilities as prior probabilities of all classifiers in the decision model, and facilitating the prediction of port activity of other target hosts which do not carry out port scanning.
It should be noted that the sample host may include at least one host, and the sample host features are the same as at least two host features corresponding to the target host.
Step 120, inputting each host characteristic into a classifier of the same type as the target type in a decision model based on the target type of each host characteristic, and outputting first prediction return values corresponding to all ports in the target host; all classifiers in the decision model are constructed based on a naive Bayesian algorithm.
Specifically, fig. 2 is a second flow chart of a host port activity prediction method provided by the embodiment of the present invention, as shown in fig. 2, because different host features have different application ranges, a local classifier and a global classifier are respectively built according to the types of the host features in the decision model, the host features are respectively input into the corresponding classifier in the decision model according to the target types of the host features, and a first prediction return value corresponding to each port is calculated according to the classifier, where the first prediction return value may represent the opening probability of the port in the case that some host features in the classifier exist, and the some host features are all host features input into the classifier, and belong to part of host features in all host features corresponding to the target host. For example, taking a host feature as an organization name, the host feature is input into a global classifier, and when the organization name exists, the opening probability corresponding to the port is calculated.
It should be noted that, the local classifier and the global classifier in the decision model are both constructed based on naive bayes algorithm, and are different in the target types of the input host features. In addition, in each host characteristic, the target types of the subnet where the address is and the active port are local characteristics, and the target types of the ASN and the organization name are global characteristics, so that the subnet where the address is and the active port are input into the local classifier, and the ASN and the organization name are input into the global classifier.
In addition, the local classifier may include a classifier corresponding to each subnet, for example, as shown in fig. 2, the local classifier includes a classifier of a subnet a and a classifier of a subnet b, and when the host characteristics of the target host include a subnet b, an active port, an ASN and an organization name, the subnet b and the active port are input to the classifier of the subnet b in the local classifier, and the ASN and the organization name are input to the global classifier.
Further, the inputting each host feature into the classifier of the same type as the target type in the decision model based on the target type of each host feature, and outputting the first prediction return value corresponding to each of all ports in the target host, including:
For each port in the target host, under the condition that the target type of each host feature is a local feature, inputting each host feature into a local classifier in the decision model, determining a first prediction return value corresponding to the port based on the first probability and the second probability corresponding to the port and the third probability and the fourth probability corresponding to each of all host features input into the local classifier, and outputting the first prediction return value corresponding to each of all ports in the target host;
Under the condition that the target type of each host feature is a global feature, inputting each host feature into a global classifier in the decision model, determining a first prediction return value corresponding to the port based on the first probability and the second probability corresponding to the port and the third probability and the fourth probability corresponding to all host features in the global classifier, and outputting the first prediction return values corresponding to all ports in the target host.
Specifically, after determining the target type of each host feature, inputting all host features belonging to the local features into a local classifier, inputting all host features belonging to the global features into the global classifier, and then calculating the first predicted return value corresponding to each port according to the prior probability even though the first predicted return value corresponding to each port, namely, according to the first probability, the second probability, the third probability and the fourth probability.
It should be noted that, for convenience of description, all host features belonging to a local feature are expressed as local host features, and all host features belonging to a global feature are expressed as global host features. The sum of the number of local host features and the number of global host features is equal to the number of all host features, i.e. the number n of all sample host features.
Taking the local classifier as an example, after all local host features are input into the local classifier, for each port, a first probability that the port belongs to an inactive port is obtained from prior probabilities without considering the local host featuresMeanwhile, determining that local host characteristics/>, exist under the condition that the port belongs to an inactive portThird probability/>,/>Representing the j-th sample host feature,/>,/>Representing the sum of the number of local host features. Then, a fifth probability that the port belongs to the inactive port in the presence of all local host features is calculated using a naive bayes formula. Furthermore, a second probability/>, of the port belonging to the active port, irrespective of the local host characteristics, is obtained from the prior probabilitiesMeanwhile, determining that local host characteristics/>, exist under the condition that the port belongs to an active portFourth probability/>. Then, a sixth probability that the port belongs to the inactive port in the presence of all local host features is calculated using a naive bayes formula. The naive Bayes calculation formula is shown as a formula (1), and the formula (1) is as follows:
Wherein, Indicating that the port exists/>, if it belongs to category cConditional probability of individual sample host features; /(I)Representation does not take into account/>The prior probability that the port belongs to class c in the case of individual local host features,/>Including a first probability/>And a second probability/>Representation/>The prior probability of each local host characteristic is a constant value; because each local host feature is an independent feature, therefore,/>;/>Including third probability/>And fourth probability/>;/>Representation/>The conditional probability that the port belongs to class c in the presence of a local host feature, and when class c is an inactive port, the fifth probability/>Can be expressed asWhen the class c is an active port, the sixth probability/>Can be expressed as/>Representing a continuous multiplication operation.
After calculating the fifth probability and the sixth probability, calculating the ratio of the fifth probability to the sixth probability by using the formula (2), and determining the reciprocal of the sum of the ratio and 1 as a first predicted return value corresponding to the port, wherein the formula (2) is as follows:
Wherein, Port/>, representing local classifier outputCorresponding first predictive return value,/>Representation portBelonging to inactive port,/>Representation port/>Belonging to an active port.
For the global classifier, except that the input global host features are different from the local classifier, other steps are the same as those of the local classifier, and the local classifier can be referred to calculate the first prediction return value corresponding to each port output by the global classifier, which is not described in detail herein.
Further, the method further comprises:
and calling at least two decision models by utilizing multiple processes, respectively determining active ports in different target hosts in the host to be tested, wherein each decision model adopts sparse matrix representation.
In particular, while the space of host ports is enormous, the actual active ports on each host are quite limited compared to all ports. The target prediction return values reflected in the decision model are all 0, so that the decision model is represented in a sparse matrix form, and the decision model and a return value list output by the decision model are stored, namely, only non-zero values are calculated and stored, and the elements without data are defaulted to 0, so that zero-value consumption resources are saved. Because the zero value in the whole decision model occupies a relatively high value, the storage space and the time complexity occupied by the decision model can be greatly reduced by using the sparse matrix representation, so that the decision model is light.
And secondly, because the prediction processes of the ports on different hosts are mutually independent, different decision models can be called by using different processes to respectively predict the active ports in the corresponding target hosts, and the parallel prediction is realized by utilizing all calculation forces provided by the multi-core computing environment, so that the prediction time of the ports of the whole network hosts is shortened. In addition, a plurality of ports can be scanned simultaneously by using scanners (e.g., ZMap) in parallel to maximize scanning efficiency.
Step 130, for each of the ports, determining a target predicted return value corresponding to the port based on the first predicted return value with the same port; the target prediction return value is used for representing the opening probability corresponding to the port under the condition that the at least two host characteristics exist.
Specifically, after determining the first predicted return values output by different classifiers, for the same port, the first predicted return values output by the local classifier and the second predicted return values output by the global classifier may be weighted and summed to obtain a target predicted return value corresponding to each port, where the target predicted return value may be understood as an opening probability corresponding to each port in the case that all host features exist.
And 140, determining the active port of the target host based on the target prediction return values corresponding to all the ports.
Specifically, after the target prediction return values corresponding to all ports in the target host are determined, a return value list is determined according to all the target prediction return values, the active port of the target host is determined from the return value list, the active port is scanned by a scanner, the full-port scanning task of the target host is completed by a small amount of scanning resources, and the port scanning efficiency and the port scanning accuracy are improved.
Further, fig. 3 is a third flowchart of a method for predicting activity of a host port according to an embodiment of the present invention, as shown in fig. 3, the determining an active port of the target host based on the target prediction report values corresponding to all the ports, includes:
Acquiring target scanning times corresponding to each port;
determining a predicted value corresponding to each port based on the target scanning times corresponding to each port and the target predicted return value;
And arranging all the predicted values in a descending order, and determining at least one port corresponding to the maximum predicted value as an active port of the target host.
Specifically, since the decision model uses only the prior probability calculated by the sample host according to random sampling in prediction, the random sampling process inevitably introduces bias. If the port corresponding to the highest target predicted return value in the return value list is always selected when the active port is determined, part of the active ports which are not involved in the sample host may be omitted, resulting in lower coverage rate of full-port scanning. Therefore, in the embodiment of the present invention, after the report value list output by the decision model is obtained, that is, after the target predicted report values corresponding to all the ports output by the decision model are obtained, the report value list is input into the exploration module, and the exploration module is used for adjusting the target predicted report values corresponding to the ports output by the decision model, so as to explore the port space of the whole target host, that is, explore the active conditions of the ports that are scanned less before, and obtain more comprehensive information. The method specifically comprises the following steps: and aiming at each port in the target host, acquiring the target scanning times corresponding to the port, calculating the predicted value corresponding to the port after adjustment according to the target scanning times and the target predicted return value of the port by using an UCB1 (Upper Confidence Bound < 1 >, confidence interval upper limit 1) algorithm, and repeatedly executing the operations to obtain the predicted values corresponding to all the ports. And then, arranging all the predicted values in descending order according to the order from large to small, and determining at least one port corresponding to the maximum predicted value in the sequence as an active port of the target host.
Further, the determining the predicted value corresponding to each port based on the target scan number corresponding to each port and the target predicted return value includes:
for each port, acquiring the total scanning times of scanning all ports on the host to be tested;
determining target uncertainty corresponding to the port based on the total scanning times and target scanning times corresponding to the port;
and determining a predicted value corresponding to the port based on the target predicted return value and the target uncertainty.
Specifically, for each port, the total number of scanning times of all ports scanned on the host to be tested can be obtained, and the target uncertainty corresponding to the port is calculated according to the logarithmic value of the total number of scanning times and the target scanning times corresponding to the port by using the formula (3), where the formula (3) is as follows:
Wherein, Representation port/>Target uncertainty,/>Representing preset super parameters for controlling the depth of exploration,/>Representing the total number of scans,/>Representation port/>Corresponding target scan times.
At the calculated portAfter the corresponding target uncertainty, the port/>The sum of the corresponding target uncertainty and the target prediction return value is determined as port/>Predicted value of/>Wherein/>Representation port/>Is a predicted value of (a). As can be seen from the formula (3), the total number of scans corresponding to each port is the same, the fewer the number of target scans of a port is, the greater the target uncertainty corresponding to the port is calculated, and after the target prediction return value corresponding to the port is combined, the ordering of the ports may be adjusted, so that the exploration of the active conditions of the ports with fewer scans is realized, and the coverage rate of the full-port scan is improved.
After calculating the predicted values corresponding to all ports, arranging all the predicted values in a descending order to obtain a predicted value list, as shown in fig. 3, wherein the x-th port in the predicted value listCorresponding predicted value/>Maximum, y port/>Corresponding predicted value/>Next, therefore, the xth port/>Active port determined to be the target host, which port is then addressed by the scanner/>Scanning is performed. It should be noted that x and y are integers and are less than the total number of all ports in the target host.
Further, the method further comprises:
inputting the active port into a scanner for scanning;
updating the target scanning times corresponding to the active port, and updating the target uncertainty corresponding to the active port based on the target scanning times;
updating the first probability, the second probability, the third probability, and the fourth probability based on the scan results of the active ports to update the decision model.
Specifically, after the scanner scans the predicted active port, the target uncertainty corresponding to the active port can be updated by updating the target scanning times of the active port, so that the search module is updated to balance the search range of different ports. In addition, the first probability, the second probability, the third probability and the fourth probability in the prior probability can be updated according to the scanning result of the active port, so that the decision model can be updated. For example, the sample host 1, the sample host 2 and the target host 3 each include a port 80, and according to the full-port scan results corresponding to the sample host 1 and the sample host 2, the port 80 of the sample host 1 is an active port, the port 80 of the sample host 2 is an inactive port, and the first probability that the port 80 belongs to the inactive port is calculated asThe second probability that port 80 belongs to an active port is/>For example, the active port of the target host 3 is predicted to be the port 80, and after the scanner scans the port 80 of the target host 3, the scanning result is consistent with the prediction result, so the first probability that the port 80 belongs to the inactive port in the prior probability is updated to be/>Updating the second probability that port 80 belongs to an active port to/>And predicting the active port of the next target host by using the updated prior probability. With the increase of scanning rounds, more training data are used by the decision model, the knowledge of the spatial activity of the host port is closer to the real active port distribution situation, the sampling deviation caused by random sampling is gradually overcome, and the prediction accuracy of the active port is continuously improved.
According to the host port activity prediction method provided by the embodiment of the invention, after at least two host characteristics corresponding to a target host are obtained, according to the target type of each host characteristic, each host characteristic is input into a classifier constructed by a naive Bayesian algorithm in a decision model, a first prediction return value corresponding to each port is output, and according to the first prediction return value corresponding to the same port, the target prediction return value corresponding to the port is determined, so that the opening probability corresponding to the port under the condition that at least two host characteristics exist is determined, before all ports of the target host are scanned, the inactive ports are eliminated according to the target prediction return value, and the active ports in the target host are determined, so that the scanning task of the port of the whole network host with high coverage rate is completed by using a small amount of scanning resources, and on the basis of ensuring the prediction accuracy, the prediction efficiency and the prediction universality based on the naive Bayesian algorithm, the scanning task is extended from the common ports to all ports, and the prediction coverage rate of the activity of the whole network port is improved.
The host port activity prediction device provided by the invention is described below, and the host port activity prediction device described below and the host port activity prediction method described above can be referred to correspondingly.
The embodiment of the present invention further provides a device for predicting activity of a host port, and fig. 4 is a schematic structural diagram of the device for predicting activity of a host port provided in the embodiment of the present invention, as shown in fig. 4, where the device 400 for predicting activity of a host port includes: an acquisition module 410, a prediction module 420, a first determination module 430, and a second determination module 440, wherein:
An obtaining module 410, configured to traverse a host to be tested and obtain at least two host features corresponding to a target host;
The prediction module 420 is configured to input, based on a target type of each host feature, each host feature into a classifier of the same type as the target type in the decision model, and output first prediction return values corresponding to all ports in the target host; all classifiers in the decision model are constructed based on a naive Bayesian algorithm;
a first determining module 430, configured to determine, for each of the ports, a target predicted return value corresponding to the port based on a first predicted return value that is the same as the port; the target prediction return value is used for representing the opening probability corresponding to the port under the condition that the at least two host characteristics exist;
A second determining module 440, configured to determine an active port of the target host based on the target prediction report values corresponding to all the ports.
According to the host port activity prediction device provided by the embodiment of the invention, after at least two host characteristics corresponding to a target host are obtained, according to the target type of each host characteristic, each host characteristic is input into a classifier constructed by a naive Bayesian algorithm in a decision model, a first prediction return value corresponding to each port is output, and according to the first prediction return value corresponding to the same port, the target prediction return value corresponding to the port is determined, so that the opening probability corresponding to the port under the condition that at least two host characteristics exist is determined, before all ports of the target host are scanned, the inactive ports are eliminated according to the target prediction return value, and the active ports in the target host are determined, so that the scanning task of the port of the whole network host with high coverage rate is completed by using a small amount of scanning resources, and on the basis of ensuring the prediction accuracy, the prediction efficiency and the prediction universality based on the naive Bayesian algorithm, the scanning task is extended from the common ports to all ports, and the prediction coverage rate of the activity of the whole network port is improved.
Optionally, the host port activity prediction apparatus 400 further includes an initialization module, specifically configured to:
Acquiring a full-port scanning result and at least two sample host characteristics corresponding to a sample host; the sample host belongs to the host to be tested;
For each sample port in the sample host, determining a first probability that each sample port belongs to an inactive port and a second probability that each sample port belongs to an active port based on the full port scan result, and determining a third probability that each sample host feature exists if each sample port belongs to an inactive port and a fourth probability that each sample host feature exists if each sample port belongs to an active port;
Initializing the decision model based on the first probability, the second probability, each of the third probabilities, and each of the fourth probabilities.
Optionally, the prediction module 420 is specifically configured to:
For each port in the target host, under the condition that the target type of each host feature is a local feature, inputting each host feature into a local classifier in the decision model, determining a first prediction return value corresponding to the port based on the first probability and the second probability corresponding to the port and the third probability and the fourth probability corresponding to each of all host features input into the local classifier, and outputting the first prediction return value corresponding to each of all ports in the target host;
Under the condition that the target type of each host feature is a global feature, inputting each host feature into a global classifier in the decision model, determining a first prediction return value corresponding to the port based on the first probability and the second probability corresponding to the port and the third probability and the fourth probability corresponding to all host features in the global classifier, and outputting the first prediction return values corresponding to all ports in the target host.
Optionally, the second determining module 440 is specifically configured to:
Acquiring target scanning times corresponding to each port;
determining a predicted value corresponding to each port based on the target scanning times corresponding to each port and the target predicted return value;
And arranging all the predicted values in a descending order, and determining at least one port corresponding to the maximum predicted value as an active port of the target host.
Optionally, the second determining module 440 is specifically configured to:
for each port, acquiring the total scanning times of scanning all ports on the host to be tested;
determining target uncertainty corresponding to the port based on the total scanning times and target scanning times corresponding to the port;
and determining a predicted value corresponding to the port based on the target predicted return value and the target uncertainty.
Optionally, the host port activity prediction apparatus 400 further includes a feedback module, which is specifically configured to:
inputting the active port into a scanner for scanning;
updating the target scanning times corresponding to the active port, and updating the target uncertainty corresponding to the active port based on the target scanning times;
updating the first probability, the second probability, the third probability, and the fourth probability based on the scan results of the active ports to update the decision model.
Optionally, the host port activity prediction apparatus 400 further includes a parallelization module, specifically configured to:
and calling at least two decision models by utilizing multiple processes, respectively determining active ports in different target hosts in the host to be tested, wherein each decision model adopts sparse matrix representation.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 5, the electronic device may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a host port activity prediction method comprising:
Traversing a host to be detected to obtain at least two host characteristics corresponding to a target host;
Inputting each host characteristic into a classifier of the same type as the target type in a decision model based on the target type of each host characteristic, and outputting first prediction return values corresponding to all ports in the target host; all classifiers in the decision model are constructed based on a naive Bayesian algorithm;
For each port, determining a target predicted return value corresponding to the port based on a first predicted return value with the same port; the target prediction return value is used for representing the opening probability corresponding to the port under the condition that the at least two host characteristics exist;
And determining the active port of the target host based on the target prediction return value corresponding to each port.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute a host port activity prediction method provided by the above methods, and the method includes:
Traversing a host to be detected to obtain at least two host characteristics corresponding to a target host;
Inputting each host characteristic into a classifier of the same type as the target type in a decision model based on the target type of each host characteristic, and outputting first prediction return values corresponding to all ports in the target host; all classifiers in the decision model are constructed based on a naive Bayesian algorithm;
For each port, determining a target predicted return value corresponding to the port based on a first predicted return value with the same port; the target prediction return value is used for representing the opening probability corresponding to the port under the condition that the at least two host characteristics exist;
And determining the active port of the target host based on the target prediction return value corresponding to each port.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a host port activity prediction method provided by the above methods, the method comprising:
Traversing a host to be detected to obtain at least two host characteristics corresponding to a target host;
Inputting each host characteristic into a classifier of the same type as the target type in a decision model based on the target type of each host characteristic, and outputting first prediction return values corresponding to all ports in the target host; all classifiers in the decision model are constructed based on a naive Bayesian algorithm;
For each port, determining a target predicted return value corresponding to the port based on a first predicted return value with the same port; the target prediction return value is used for representing the opening probability corresponding to the port under the condition that the at least two host characteristics exist;
And determining the active port of the target host based on the target prediction return value corresponding to each port.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for predicting activity of a host port, comprising:
Traversing a host to be detected to obtain at least two host characteristics corresponding to a target host;
Inputting each host characteristic into a classifier of the same type as the target type in a decision model based on the target type of each host characteristic, and outputting first prediction return values corresponding to all ports in the target host; all classifiers in the decision model are constructed based on a naive Bayesian algorithm;
For each port, determining a target predicted return value corresponding to the port based on a first predicted return value with the same port; the target prediction return value is used for representing the opening probability corresponding to the port under the condition that the at least two host characteristics exist;
And determining the active port of the target host based on the target prediction return value corresponding to each port.
2. The method of claim 1, wherein prior to obtaining at least two host characteristics corresponding to a target host, the method further comprises:
Acquiring a full-port scanning result and at least two sample host characteristics corresponding to a sample host; the sample host belongs to the host to be tested;
For each sample port in the sample host, determining a first probability that each sample port belongs to an inactive port and a second probability that each sample port belongs to an active port based on the full port scan result, and determining a third probability that each sample host feature exists if each sample port belongs to an inactive port and a fourth probability that each sample host feature exists if each sample port belongs to an active port;
Initializing the decision model based on the first probability, the second probability, each of the third probabilities, and each of the fourth probabilities.
3. The method for predicting activity of a host port according to claim 2, wherein the inputting each host feature into a classifier of the same type as the target type in a decision model based on the target type of each host feature, outputting first prediction return values corresponding to all ports in the target host, respectively, includes:
For each port in the target host, under the condition that the target type of each host feature is a local feature, inputting each host feature into a local classifier in the decision model, determining a first prediction return value corresponding to the port based on the first probability and the second probability corresponding to the port and the third probability and the fourth probability corresponding to each of all host features input into the local classifier, and outputting the first prediction return value corresponding to each of all ports in the target host;
Under the condition that the target type of each host feature is a global feature, inputting each host feature into a global classifier in the decision model, determining a first prediction return value corresponding to the port based on the first probability and the second probability corresponding to the port and the third probability and the fourth probability corresponding to all host features in the global classifier, and outputting the first prediction return values corresponding to all ports in the target host.
4. A method of predicting activity of a host port according to claim 2 or 3, wherein said determining an active port of the target host based on the target predicted return values corresponding to each of the ports comprises:
Acquiring target scanning times corresponding to each port;
determining a predicted value corresponding to each port based on the target scanning times corresponding to each port and the target predicted return value;
And arranging all the predicted values in a descending order, and determining at least one port corresponding to the maximum predicted value as an active port of the target host.
5. The method of claim 4, wherein determining the predicted value corresponding to each port based on the target scan number and the target predicted return value corresponding to each port comprises:
for each port, acquiring the total scanning times of scanning all ports on the host to be tested;
determining target uncertainty corresponding to the port based on the total scanning times and target scanning times corresponding to the port;
and determining a predicted value corresponding to the port based on the target predicted return value and the target uncertainty.
6. The method of claim 5, further comprising:
inputting the active port into a scanner for scanning;
updating the target scanning times corresponding to the active port, and updating the target uncertainty corresponding to the active port based on the target scanning times;
updating the first probability, the second probability, the third probability, and the fourth probability based on the scan results of the active ports to update the decision model.
7. A host port activity prediction method according to any one of claims 1-3, further comprising:
and calling at least two decision models by utilizing multiple processes, respectively determining active ports in different target hosts in the host to be tested, wherein each decision model adopts sparse matrix representation.
8. A host port activity prediction apparatus, comprising:
The acquisition module is used for traversing the host to be detected and acquiring at least two host characteristics corresponding to the target host;
The prediction module is used for inputting each host characteristic into a classifier of the same type as the target type in the decision model based on the target type of each host characteristic, and outputting first prediction return values corresponding to all ports in the target host; all classifiers in the decision model are constructed based on a naive Bayesian algorithm;
the first determining module is used for determining a target prediction return value corresponding to each port based on the first prediction return value with the same port; the target prediction return value is used for representing the opening probability corresponding to the port under the condition that the at least two host characteristics exist;
And the second determining module is used for determining the active port of the target host based on the target prediction return value corresponding to each port.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the host port activity prediction method of any one of claims 1-7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the host port activity prediction method of any of claims 1-7.
CN202410385477.XA 2024-04-01 2024-04-01 Host port activity prediction method, device, equipment and storage medium Pending CN117972568A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410385477.XA CN117972568A (en) 2024-04-01 2024-04-01 Host port activity prediction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410385477.XA CN117972568A (en) 2024-04-01 2024-04-01 Host port activity prediction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117972568A true CN117972568A (en) 2024-05-03

Family

ID=90853823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410385477.XA Pending CN117972568A (en) 2024-04-01 2024-04-01 Host port activity prediction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117972568A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110471856A (en) * 2019-08-21 2019-11-19 大连海事大学 A kind of Software Defects Predict Methods based on data nonbalance
CN117082118A (en) * 2023-04-23 2023-11-17 深圳市广联计算有限公司 Network connection method based on data derivation and port prediction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110471856A (en) * 2019-08-21 2019-11-19 大连海事大学 A kind of Software Defects Predict Methods based on data nonbalance
CN117082118A (en) * 2023-04-23 2023-11-17 深圳市广联计算有限公司 Network connection method based on data derivation and port prediction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIRUI LUO等: "IPREDS: Efficient prediction system for Internet-wide port and service scanning", 《ACM》, 31 March 2024 (2024-03-31), pages 1 - 24, XP059430472, DOI: 10.1145/3649470 *

Similar Documents

Publication Publication Date Title
Feige et al. Learning and inference in the presence of corrupted inputs
US8725871B2 (en) Systems and methods for application dependency discovery
US10924418B1 (en) Systems and methods for fast detection of elephant flows in network traffic
CN113468071B (en) Fuzzy test case generation method, system, computer equipment and storage medium
WO2018157752A1 (en) Approximate random number generator by empirical cumulative distribution function
CN113392971B (en) Strategy network training method, device, equipment and readable storage medium
CN110727943B (en) Intrusion detection method and device
JP6691094B2 (en) Learning device, detection system, learning method and learning program
Huang et al. Robust truth discovery against data poisoning in mobile crowdsensing
Rajakumaran et al. Denial of service attack prediction using gradient descent algorithm
CN117235742B (en) Intelligent penetration test method and system based on deep reinforcement learning
CN114615066A (en) Target path determination method and device
JP6577516B2 (en) Determination apparatus, analysis system, determination method, and determination program
CN117972568A (en) Host port activity prediction method, device, equipment and storage medium
CN112165498A (en) Intelligent decision-making method for penetration test
JP2016010124A (en) Management device, management program, and information processing system
CN115496180A (en) Training method, generating method and device of network traffic characteristic sequence generating model
US20230419172A1 (en) Managing training of a machine learning model
EP4349055A2 (en) Dimensioning of telecommunication infrastructure
CN115437858A (en) Edge node abnormity positioning method, device, equipment and computer program product
CN113962712A (en) Method for predicting fraud gangs and related equipment
WO2021075009A1 (en) Learning device, estimation device, learning method, and learning program
CN116798103B (en) Artificial intelligence-based face image processing method and system
CN113938951B (en) Service processing method and server based on zero trust
CN117076131B (en) Task allocation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination