CN116846690B - IPv6 network space mapping method based on industry classification and probability model - Google Patents

IPv6 network space mapping method based on industry classification and probability model Download PDF

Info

Publication number
CN116846690B
CN116846690B CN202311119847.7A CN202311119847A CN116846690B CN 116846690 B CN116846690 B CN 116846690B CN 202311119847 A CN202311119847 A CN 202311119847A CN 116846690 B CN116846690 B CN 116846690B
Authority
CN
China
Prior art keywords
port
industry
probability
ipv6
scanning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311119847.7A
Other languages
Chinese (zh)
Other versions
CN116846690A (en
Inventor
李澄清
谷泽伟
王陆陆
韩宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangtan University
Original Assignee
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangtan University filed Critical Xiangtan University
Priority to CN202311119847.7A priority Critical patent/CN116846690B/en
Publication of CN116846690A publication Critical patent/CN116846690A/en
Application granted granted Critical
Publication of CN116846690B publication Critical patent/CN116846690B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/618Details of network addresses
    • H04L2101/659Internet protocol version 6 [IPv6] addresses

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to an IPv6 network space mapping method based on industry classification and probability models. Comprising the following steps: obtaining and labeling an organization name sample from a public autonomous system database; selecting a certain proportion from the sample as a data set, and converting the organization name into word vector representation; extracting feature vectors of the tissue names, and training a classification model by using a naive Bayes classifier; classifying the area autonomous organization names and labeling the IPv6 addresses; in the scanning process, extracting addresses from each industry category according to a set threshold value to carry out full-port scanning; constructing a probability model of a prediction port according to the probability model; performing port scanning on addresses in the IPv6 surviving address library; and updating the probability model based on the data, and completing the scanning of the current round. The invention enables the detection and vulnerability assessment of network assets to become flexible and efficient, and can provide finer IPv6 network asset data for a network space asset mapping and vulnerability assessment system.

Description

IPv6 network space mapping method based on industry classification and probability model
Technical Field
The invention belongs to the field of network space mapping, and in particular relates to an IPv6 network space mapping method based on industry classification and a probability model.
Background
With the rapid development of 5G and big data information technology, the industrial Internet is rapidly rising, and the digitization and the intelligent upgrading of the traditional manufacturing industry are strongly promoted. Network devices are widely used in the field of intelligent manufacturing, such as industrial control networks and smart agricultural production scenarios. The intelligent interconnection effectively improves the industrial production efficiency, but also brings serious potential safety hazards to network infrastructure represented by network infrastructure and domain name systems. The traditional security defense system is difficult to resist the current intricate network attack, so that the transition from the passive protection mode to the active protection mode has become a common knowledge in the field of network security. Real-time comprehensive asset detection and vulnerability status awareness of networked devices is the basis for security threat analysis. In this context, network space mapping technology plays an important role, and by means of network detection and acquisition, the technology can accurately discover and identify information of network space infrastructure, users and service network assets, and simultaneously analyze their properties at various spatial levels. However, the existing network space mapping method has the defects of low scanning speed, limited detected asset information, waste of bandwidth in the scanning process and the like in supporting the IPv6 space.
Thus, there is a need in the art for a new method of cyber-space mapping.
Disclosure of Invention
Aiming at the technical improvement requirement of inaccurate asset information and vulnerability assessment of scanning and mapping intelligent networking equipment, the invention provides a method for scanning network asset ports in an IPv6 network space, which aims to solve the problem of lower vulnerability assessment efficiency to a certain extent and makes up the defect of larger bandwidth consumption of the traditional scanning method.
To achieve the object, the invention provides an IPv6 network space mapping method based on industry classification and probability model, which is characterized in that the method at least comprises the following 8 steps:
s1, randomly sampling organization names in a public autonomous system library according to a set threshold value to obtain a sample, and marking according to education, cloud service providers, operators, industrial enterprises and five non-attribution industry categories;
s2, selecting a certain proportion of the extracted samples as a data set, cleaning and preprocessing the data, removing the non-attributive area autonomous organization name, and converting the non-attributive area autonomous organization name into word vector representation;
s3, carrying out industry classification on the regional autonomous organization name by adopting a naive Bayes model as a classifier based on the word vector of the regional autonomous organization name;
step S3 comprises the following sub-steps:
s3a, calculating the prior probability of the industry category, and regarding the training set
Wherein the method comprises the steps ofAs a feature vector of the object set,
,/>for any of the training data to be used,
as the dimension of the feature vector,
C 1 ,C 2 ,C 3 ,C 4 ,C 5 corresponding to five industry categories of education, cloud service provider, operator, industrial enterprise and non-attribution organization respectively, wherein m is the size of training set, n is characteristic number, and the prior probability of existence of a certain industry category is
Wherein the function is indicatedAt->Return 1 immediately, otherwise return 0, feature x of the j-th dimension j The prior probability of existence is
S3b, calculating the conditional probability of the feature occurrence, namely, C in the industry category k Under the condition of (a) the j-th dimension of the feature vector
The probability at 1 is
Wherein the function is indicatedRefers to +.>Returns to 1 when true, otherwise returns to 0,
refers to +.>And->Returning to 1 when the result is true, otherwise returning to 0;
S3C, calculating the conditional probability of the occurrence of the industry category, namely under the condition that the feature vector is x, the industry category is C k Posterior probability of (2) is
Then the characteristic vector x is classified according to the maximum posterior probability, namely
S4, acquiring an IPv6 survival address library from the Internet, and marking the industry category of the IPv6 address according to the public autonomous system library;
s5, extracting addresses from each industry category according to a certain proportion from IPv6 addresses which are subjected to industry category classification, and then carrying out full-port scanning when the number of the industry categories is equal to the number of the industry categories;
s6, based on full-port scanning data, counting the open condition probability of the network port in each industry category, and constructing a probability model for open port prediction of each industry;
s7, based on the obtained probability model, all addresses in the IPv6 survival address library are used for acquiring ports opened by the network according to industry categories, and carrying out port scanning and security vulnerability assessment;
s8, updating parameters in the Bayesian network based on data obtained by port scanning, and ending the round-of-the-round scanning; resampling in the next round of scanning, and carrying out full-port scanning and Bayesian network updating to obtain more accurate network opening port information.
In a specific embodiment, step S2 comprises the following sub-steps:
s2a, replacing symbols contained in the organization names by spaces, and separating English letters and digital length strings in symbol strings by spaces to remove non-attributive or repetitive regional autonomous organization names so as to ensure the unified format of the organization names;
s2b, extracting keywords of organization names by using a word bag model, and constructing feature vectors, wherein each organization name corresponds to a vector with the size of 1 x wordNum, and the wordNum is the number of key words in the statistical organization names;
s2c, converting the obtained text feature vector into a word bag feature matrix for storage, wherein each row of the word bag feature matrix is a two-dimensional matrix, each column of the word bag feature matrix represents an organization name, each column corresponds to a key word, and elements in the matrix represent the occurrence frequency of each word in the corresponding organization name and are used for subsequent classifier training and industry classification.
In the present invention, the "keyword" in step S2b is exemplified as follows. For example, the name of the regional autonomous organization of the Chinese Education and scientific Research net is FITIA-AS-BKB China Education and Research Network CERNET, and keywords of China, reduction, research, network and Cernet are extracted from the regional autonomous organization; one area autonomous organization name of China Mobile is CMI-INT-ASChina Mobile International Limited, and keywords of China, mobile, international, limited are extracted.
In a specific embodiment, step S4 comprises the following sub-steps:
s4a, establishing a hierarchical dictionary tree by using IPv6 address prefixes of each area autonomous organization number, taking 0 as a left child node and 1 as a right child node, and storing the corresponding area autonomous organization number in the dictionary tree;
s4b, starting from the root node, searching for an area autonomous organization number to which the IPv6 data belongs, gradually matching with the IPv6 address prefix, finding a matched deepest node, acquiring a corresponding area autonomous organization number from the node, and finishing industry category labeling of the IPv6 address data according to the industry category corresponding to the area autonomous organization number.
In a specific embodiment, step S5 comprises the following sub-steps:
s5a, carrying out sample extraction on IPv6 addresses of each industry category, obtaining a certain number of IPv6 address samples, carrying out full-port detection operation on all address samples by using a ZMapv6 tool, removing host data with the number of open ports being more than 100, and storing detected open port information in a database;
and S5b, mapping and collecting service, system version and open protocol information of the open port of each IPv6 address in the database by using a Masscan tool, and storing the information into the database.
In a specific embodiment, step S6 comprises the following sub-steps:
s6a, in order to construct a scanning port prediction probability model of each industry category, firstly calculating the conditional probability of three types of features from the acquired IPv6 full-port mapping data, wherein the probability of opening port b when a host port a is opened is the transmission layer featureP 1 (port b |port a_open ) Ports when the protocol opened by host port a contains a particular protocol characteristic valueb probability of openness is the comprehensive characteristics of the class 1 transmission layer and the application layerP 2 (prot b | (prot a_Open ,protocal k ) A) is provided; the probability of port b being open when host port a is open and response host response message is specific information banner is class 2 transport layer and application layer integrated featureP 3 (prot b | (prot a_Open ,port banner ));
S6b, storing the three conditional probabilities in the S6a in a database in a descending order, wherein the stored format is a tuple of the probability condition and the probability result, and constructing a port prediction probability model.
In a specific embodiment, step S7 comprises the following sub-steps:
s7a, scanning 20 ports which are commonly used for all addresses remained after sampling, storing a scanning result into a database, and constructing open vector information of the 20 ports;
s7b, based on the open vector information of the commonly used 20 ports, comparing the conditional probabilities in the port prediction probability model of each industry, finding out the first 1500 ports with the maximum conditional probabilities, then performing scanning operation, storing the results into a database, and ending the current round of scanning.
In a specific embodiment, step S8 comprises the following sub-steps:
s8a, calculating three conditional probabilities in S6a according to five industries belonging to the open port data after completing one round of scanning, and updating a probability model for port prediction;
s8b, randomly sampling a batch of new address data for full-port scanning, and updating a probability model for port prediction.
The present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements an IPv6 network space mapping method based on industry classification and probability model as described above.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the IPv6 network space mapping method based on industry classification and probability model as described above when executing the computer program.
The port prediction probability model is used for port scanning of IPv6 network space equipment, so that a network space asset mapping and vulnerability detection system capable of detecting IPv6 network space according to industry classification is constructed.
The invention can realize the comprehensive detection of the known IP address in the IPv6 network environment by multiple rounds of scanning, more efficiently acquire the real-time data information, improve the detection efficiency of the network equipment and the safety state thereof and reduce the consumption of scanning bandwidth.
In general, through the technical scheme of the invention, the following beneficial effects can be obtained:
1) By carrying out industry classification operation on the regional autonomous organizations in advance and constructing a regional autonomous organization classification model, the industry labeling of IPv6 addresses to which each regional autonomous organization belongs is realized, the IPv6 network space detection according to different industries is realized, and the network asset detection and vulnerability assessment become flexible and efficient.
2) The port probability model for IPv6 network space detection is constructed for scanning, so that in detection aiming at cloud service manufacturer industry, ports needing vulnerability scanning can be found more efficiently, and the bandwidth consumption of the network and the consumption of computing resources are reduced.
3) The scanning is performed through industry classification, so that the asset information in the IPv6 network space is favorably sorted according to the industry classification from the source of data acquisition, and finer IPv6 network asset data is provided for a network space asset mapping and vulnerability assessment system.
Advantages of the invention will be set forth in the detailed description which follows, and in part will be apparent from the description.
Drawings
Advantages of the invention will be illustrated by the following description of embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is an overall flow chart of an IPv6 network space mapping method based on industry classification and probability models according to the present invention.
FIG. 2 is a schematic diagram of a classification method of an area autonomous organization industry, namely steps S1 to S3, according to an embodiment of the invention.
Fig. 3 is a schematic diagram of a dictionary tree for creating an IPv6 address prefix corresponding to an area autonomous organization number according to an embodiment of the present invention, i.e., step S4.
Fig. 4 is a schematic diagram of a port scan probability model constructed in the process of performing scanning according to an embodiment of the present invention, i.e., steps S5 to S8.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following examples. The specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Example 1
The embodiment discloses an area autonomous organization industry classification method used in an IPv6 network space mapping method based on industry classification and probability model, and reference is made to fig. 2. Also corresponds to steps S1-S3 in FIG. 1.
S100, acquiring regional autonomous organization information.
Specifically, the information about the regional autonomous organization may be obtained from a public regional autonomous organization information base in the internet, the information containing a regional autonomous organization number, an organization name, and a managed IPv6 address section, the IPv6 address section containing a start address and an end address. The acquired data is converted into a tsv file format, namely a plain text file which stores the data in a table-making separation value format, so that the data can be conveniently extracted and used in the later stage.
Specifically, the threshold may be adjusted according to the data size according to a set threshold, and when the data size is smaller, the threshold may be a larger value. A number of organization name samples are randomly sampled. And then, classifying the three industries into five industry categories of education, cloud service providers, operators, industrial enterprises and non-attribution for marking, and realizing industry classification of organization names. The selection of the 5 industry categories can perform detection scanning operation on industries with more aggregated IPv6 addresses, and is also convenient for finding the rule of an IPv6 address open port in each industry, so that a port prediction probability model is constructed in the subsequent steps.
Preferably, the data sampling portion of this embodiment may be implemented by using a random data sampling function library in Python language, and the set threshold may be selected according to data distribution, so that the data distribution is as uniform as possible, so as to reflect a real situation.
Preferably, the data labeling of the sample can be performed according to information in an internet enterprise information base, which is beneficial to improving the accuracy of the data and the classification result.
S110, cleaning the data of the regional autonomous organization information, wherein the step mainly comprises the steps of carrying out duplication removal and symbol processing operation on the data.
Specifically, first, because the symbols contained in the organization names have a negative impact on industry classification, it is necessary to remove the symbols and replace the symbols contained in the organization names with spaces. Second, the combination of long english strings and long numerical strings affects the expression of feature vectors, since many numerical numbers only represent the administrative codes of some companies, and these numerical numbers, which are irrelevant to industry categories, affect the classification result, the english letters and long numerical strings in the symbol strings are separated by spaces. Thirdly, a plurality of area autonomous organization names are empty in the area autonomous organization information base, or one area autonomous organization name possibly has a plurality of area autonomous organization numbers, and the useless and redundant data need to be removed, so that the non-attributive or repeated area autonomous organization names are removed to ensure the unified format of the organization names.
S120, converting the regional autonomous organization text into word feature vectors.
Specifically, a keyword of the organization name is extracted by using a bag-of-words model, a feature vector is constructed, each organization name is regarded as a set of words, and whether each word appears in the organization name or not is counted as a value of the feature vector. A vector with a size of 1 x wordNum is constructed for each organization name, where wordNum is the number of key words in the statistical organization name.
Preferably, the organization names in the sample data are converted into feature vector representations, and the sample data can be subjected to conversion analysis by adopting a character processing function commonly used in machine learning.
S130, constructing a naive Bayesian network training model.
Specifically, a naive Bayes model is selected as a classifier, and the word vector of the organization name generated in the step S120 is utilized to train the sample to obtain a classification model, so that the accurate classification of the area autonomous organization name is realized.
Specifically, the naive bayes model is expressed mathematically as follows, for a part of the samples that have been labeled, the prior probability of the industry class is calculated, for the training set
Wherein the method comprises the steps ofAs a feature vector of the object set,
,/>for any of the training data to be used,
as the dimension of the feature vector,
C 1 ,C 2 ,C 3 ,C 4 ,C 5 corresponding to five industry categories of education, cloud service provider, operator, industrial enterprise and non-attribution organization respectively, wherein m is the size of training set, n is characteristic number, and the prior probability of existence of a certain industry category is
Wherein the function is indicatedAt->Return 1 immediately, otherwise return 0, feature x of the j-th dimension j The prior probability of existence is
Calculating conditional probability of feature occurrence, i.e. C in industry class k Under the condition of (a) the j-th dimension of the feature vectorThe probability at 1 is
Wherein the function is indicatedRefers to +.>Returns to 1 when true, otherwise returns to 0,
refers to +.>And->Returning to 1 when the result is true, otherwise returning to 0;
calculating the conditional probability of the occurrence of the industry category, namely under the condition that the characteristic vector is x, the industry category is C k Posterior probability of (2) is
Then the characteristic vector x is classified according to the maximum posterior probability, namely
The construction of the naive Bayesian network for the regional autonomous organization industry classification is completed, the constructed network is utilized to classify the data, the calculation efficiency of the naive Bayesian network is higher, the calculation amount can be effectively saved, and the classification efficiency of the regional autonomous organization is improved.
And S140, finishing data classification.
Specifically, the naive Bayesian network is utilized to conduct industry classification on all the rest regional autonomous organizations, corresponding industry categories are marked, and the regional autonomous organization information is stored in the regional autonomous organization information database.
Example 2
The embodiment discloses a constructed area autonomous organization number dictionary tree used in an IPv6 network space mapping method based on industry classification and probability model, and reference is made to FIG. 3. Also corresponds to step S4 in fig. 1.
In this step, first, the IPv6 address prefix is extracted, i.e., step S41. Specifically, the prefix of the initial address in the IPv6 address segment owned by the regional autonomous organization is extracted, and the IPv6 address with the general IPv6 address expression form being the preferred IPv6 address is expressed as follows: xxxx: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, wherein each x represents a 4-bit hexadecimal digital address format, such as 2001:1210:100:1:218, and double colon indicates that the omitted data segment is 0. The IPv6 address in the format is converted into a binary form for representation, so that a dictionary tree of IPv6 address prefixes is constructed later and corresponding area autonomous organization numbers are stored.
The next step consists in building a hierarchical dictionary tree and in storing the regions in the dictionary tree, steps S42 and S43. Specifically, a dictionary tree of IPv6 address prefixes corresponding to the regional autonomous organization numbers is established. For each area autonomous organization number, a hierarchical dictionary tree is established by using the IPv6 address prefix of the area autonomous organization number, and the area autonomous organization number is stored in the dictionary tree. And the IPv6 address prefix takes 0 as a left child node and takes 1 as a right child node. When one exists in the left and right child nodes of a node, the IPv6 prefix represented by the node is indicated to be suffixed, otherwise, the node is the last level node, and the whole IPv6 address prefix is represented. The area autonomous organization of the address prefix of the last stage stores less initial addresses of address segments owned by all the area autonomous organizations into a constructed dictionary tree, so that the IPv6 addresses are matched with the corresponding area autonomous organization numbers.
Then, the step S44 is performed by matching and searching the area autonomous organization number step by step from the root node. Specifically, the method for matching the autonomous organization number of the area to which the IPv6 data belongs comprises the following steps: the IPv6 address prefix is first matched step by step starting from the root node until a deepest node is found. It has no left and right child nodes, and this node represents the IPv6 address prefix of this address. And acquiring a corresponding area autonomous organization number from the node, wherein the area autonomous organization number is the area autonomous organization number to which the IPv6 address belongs.
Example 3
The embodiment discloses a port scanning probability model constructing method used in an IPv6 network space mapping method based on industry classification and probability models, and reference is made to FIG. 4. Also corresponding to steps S5 to S8 in fig. 1.
First, the method comprises the steps of full-port scanning and vulnerability assessment of partial IPv6 addresses, namely, step S5.
Specifically, for the IPv6 address for finishing the mapping operation of the autonomous organization number of the area and the industry category, the address is extracted from each industry category according to a certain proportion to carry out full port scanning, so that the condition that the network port in each industry category is opened is obtained.
Specifically, based on the full-port scan data, the conditional probability of the network port opening in each industry category is counted, and a probability model for the open port prediction of each industry is constructed, namely step S6.
Based on the probability model, all addresses in the IPv6 survival address library are obtained according to classification to obtain the open ports of the network, and the port scanning and vulnerability assessment are carried out to obtain the open ports and the security state conditions in the whole IPv6 network, namely, step S7.
Specifically, since predicting ports of a host often requires a certain number of known open ports to implement, only the most commonly used open ports are scanned when selecting the best host predicted port. When at least one common open port is known, the vector information of the open port composition can be used to predict other ports on the same host.
Based on the data of the port scanning, updating parameters in the Bayesian network, completing one round of scanning, resampling in the next round of scanning, performing full-port scanning, and updating the parameter configuration of the Bayesian network to realize continuous improvement of the network space mapping range and the vulnerability detection precision, namely, step S8.
In establishing an efficient scanning strategy, the degree of scanning of networks that respond to services in IPv6 needs to be weighed. On the one hand, scanning the known IPv6 address space on a port may increase the likelihood of finding all hosts responding on that port, but doing so consumes more bandwidth and may have a greater impact on the target network. On the other hand, scanning subnets under ports of the seed subset may reduce bandwidth consumption, but may miss hosts in the subnets that also respond on that port. Therefore, the bandwidth limitation of the user decides whether to use the key factor of thoroughly scanning the port subnet size and keeps it as a user-specified parameter.
When a scanning strategy is implemented, after a port prediction probability model of each industry is built, 20 ports which are commonly used for all addresses are scanned, scanned data are stored in a database, and vectors with the 20 ports open are built;
preferably, this embodiment can be modified appropriately based on actual data feedback to facilitate the discovery of more network asset port openness.
Preferably, based on the open vector of 20 ports in common use, according to the port prediction probability model of each industry, the first 1500 ports with the largest conditional probability are found for scanning operation; the data can be adjusted within 100 according to the actual application condition.
The method can be implemented separately by constructing a network space mapping and vulnerability assessment system based on the method, and finally the detection operation of the IPv6 network space is implemented by operating the same database.
Preferably, the same database is operated to store the area autonomous organization information and the IPv6 asset information separately, and the area autonomous organization number is selected as a foreign key to associate the two.
The system can also introduce some open-source vulnerability recognition systems, and can evaluate the vulnerability situation of the network asset more accurately after the port detection is completed.
In the embodiment of the invention, the probability model is formed by the conditional probability composed of the data characteristics of the network layer and the transmission layer. Firstly, dividing network sections of a survival IPv6 address set, then scanning all ports of the IPv6 address set, acquiring a data set of a real environment and the survival IPv6 address set, and dividing industries of the survival address set. And then calculating conditional probability of the data in the real data set according to the characteristics of the transmission layer, the application layer and the network layer, further predicting probability models of the port and the service, constructing a scanning database, continuously updating and iterating to obtain a new port prediction model, and applying and updating the data in the next loophole scanning. The scanning ports are predicted by constructing port probability models of different industries, so that vulnerability scanning bandwidth is saved in the scanning process, and the influence on a target network is reduced; and at the same time, under the condition of a certain bandwidth, more comprehensive asset and security status information is obtained compared with the traditional mode of scanning the network by the preset port. In addition, the probability model is updated continuously according to the real-time data, and the data information is updated at a faster speed.
FIG. 1 is an overall flow chart of an IPv6 network space mapping method based on industry classification and probability models according to the present invention. In fig. 1, steps S1 and S2 are included first, namely, cleaning data of an autonomous organization of an area; step S3, namely an industry classification model; then step S4, namely marking the type of IPv6 address industry; s5, sampling IPv6 address to scan all ports; s6, constructing a port probability model according to industry division; step S7, namely scanning the residual IPv6 address; finally, updating in the step S8 and starting the next round of scanning; wherein the updating is from step S7 back to step S6, and the starting of the next round of scanning is from step S6 back to step S5. Wherein step S5, step S6, step S7 and step S8 are also collectively referred to as a scanning flow.
FIG. 2 is a schematic diagram of a classification method of an area autonomous organization industry, namely steps S1 to S3, according to an embodiment of the invention. Step S100 is also divided into steps in sequence, namely, the regional autonomous organization information is acquired; step S110, namely cleaning the regional autonomous organization information data; step S120, namely converting the regional autonomous organization text into word feature vectors; step S130, constructing a naive Bayesian network training model; and step S140, completing data classification.
Fig. 3 is a schematic diagram of a dictionary tree for creating an IPv6 address prefix corresponding to an area autonomous organization number according to an embodiment of the present invention, i.e., step S4. Specifically, step S41 is also performed sequentially, namely, IPv6 address prefixes are extracted; step S42, constructing a hierarchical dictionary tree; step S43, namely, storing the autonomous organization number of the area in the dictionary tree; and step S44, namely the root node is matched and searched for the area autonomous organization number step by step.
Fig. 4 is a schematic diagram of a port scan probability model constructed in the process of performing scanning according to an embodiment of the present invention, i.e., steps S5 to S8. Step S5, namely, sampling IPv6 address to carry out full-port scanning, storing port scanning data into a structured database, and calculating corresponding conditional probability; s6, constructing a port probability model according to industry division; step S7, scanning the residual IPv6 address; and step S8, namely updating and returning to step S5 to start the next round of scanning.
The foregoing examples are provided for the purpose of clearly illustrating the technical aspects of the present invention and are not to be construed as limiting the embodiments of the present invention. Any other equivalent technical characteristics may be changed or modified without changing the basic idea and essence of the present invention, and the present invention shall fall within the scope of the claims.

Claims (9)

1. The IPv6 network space mapping method based on industry classification and probability model is characterized by comprising the following 8 steps:
s1, randomly sampling organization names in a public autonomous system library according to a set threshold value to obtain a sample, and marking according to education, cloud service providers, operators, industrial enterprises and five non-attribution industry categories;
s2, selecting a certain proportion of the extracted samples as a data set, cleaning and preprocessing the data, removing the non-attributive area autonomous organization name, and converting the non-attributive area autonomous organization name into word vector representation;
s3, carrying out industry classification on the regional autonomous organization name by adopting a naive Bayes model as a classifier based on the word vector of the regional autonomous organization name;
step S3 comprises the following sub-steps:
s3a, calculating the prior probability of the industry category, and regarding the training set
Wherein the method comprises the steps ofAs a feature vector of the object set,
,/>for any of the training data to be used,
as the dimension of the feature vector,
C 1 ,C 2 ,C 3 ,C 4 ,C 5 corresponding to five industry categories of education, cloud service provider, operator, industrial enterprise and non-attribution organization respectively, wherein m is the size of training set, n is characteristic number, and the prior probability of existence of a certain industry category is
Wherein the function is indicatedAt->Return 1 immediately, otherwise return 0, feature x of the j-th dimension j The prior probability of existence is
S3b, calculating the conditional probability of the feature occurrence, namely, C in the industry category k Under the condition of (a) the j-th dimension of the feature vector
1-time summaryThe rate is as follows
Wherein the function is indicatedRefers to +.>Returns to 1 when true, otherwise returns to 0,
refers to +.>And->Returning to 1 when the result is true, otherwise returning to 0;
S3C, calculating the conditional probability of the occurrence of the industry category, namely under the condition that the feature vector is x, the industry category is C k Posterior probability of (2) is
Then the characteristic vector x is classified according to the maximum posterior probability, namely
S4, acquiring an IPv6 survival address library from the Internet, and marking the industry category of the IPv6 address according to the public autonomous system library;
s5, extracting addresses from each industry category according to a certain proportion from IPv6 addresses which are subjected to industry category classification, and then carrying out full-port scanning when the number of the industry categories is equal to the number of the industry categories;
s6, based on full-port scanning data, counting the open condition probability of the network port in each industry category, and constructing a probability model for open port prediction of each industry;
s7, based on the obtained probability model, all addresses in the IPv6 survival address library are used for acquiring ports opened by the network according to industry categories, and carrying out port scanning and security vulnerability assessment;
s8, updating parameters in the Bayesian network based on data obtained by port scanning, and ending the round-of-the-round scanning; resampling in the next round of scanning, and carrying out full-port scanning and Bayesian network updating to obtain more accurate network opening port information.
2. The IPv6 network space mapping method based on industry classification and probability model of claim 1, wherein step S2 comprises the sub-steps of:
s2a, replacing symbols contained in the organization names by spaces, and separating English letters and digital length strings in symbol strings by spaces to remove non-attributive or repetitive regional autonomous organization names so as to ensure the unified format of the organization names;
s2b, extracting keywords of organization names by using a word bag model, and constructing feature vectors, wherein each organization name corresponds to a vector with the size of 1 x wordNum, and the wordNum is the number of key words in the statistical organization names;
s2c, converting the obtained text feature vector into a word bag feature matrix for storage, wherein each row of the word bag feature matrix is a two-dimensional matrix, each column of the word bag feature matrix represents an organization name, each column corresponds to a key word, and elements in the matrix represent the occurrence frequency of each word in the corresponding organization name and are used for subsequent classifier training and industry classification.
3. The IPv6 network space mapping method based on industry classification and probability model of claim 1, wherein step S4 comprises the sub-steps of:
s4a, establishing a hierarchical dictionary tree by using IPv6 address prefixes of each area autonomous organization number, taking 0 as a left child node and 1 as a right child node, and storing the corresponding area autonomous organization number in the dictionary tree;
s4b, starting from the root node, searching for an area autonomous organization number to which the IPv6 data belongs, gradually matching with the IPv6 address prefix, finding a matched deepest node, acquiring a corresponding area autonomous organization number from the node, and finishing industry category labeling of the IPv6 address data according to the industry category corresponding to the area autonomous organization number.
4. The IPv6 network space mapping method based on industry classification and probability model of claim 1, wherein step S5 comprises the sub-steps of:
s5a, carrying out sample extraction on IPv6 addresses of each industry category, obtaining a certain number of IPv6 address samples, carrying out full-port detection operation on all address samples by using a ZMapv6 tool, removing host data with the number of open ports being more than 100, and storing detected open port information in a database;
and S5b, mapping and collecting service, system version and open protocol information of the open port of each IPv6 address in the database by using a Masscan tool, and storing the information into the database.
5. The IPv6 network space mapping method based on industry classification and probability model of claim 1, wherein step S6 comprises the sub-steps of:
s6a, in order to construct a scanning port prediction probability model of each industry category, firstly calculating the conditional probability of three types of features from the acquired IPv6 full-port mapping data, wherein the probability of opening port b when a host port a is opened is the transmission layer featureP 1 (port b | port a_open ) The probability of port b opening when the protocol opened by host port a contains a specific protocol feature value is the class 1 transport layer and application layer integrated featureP 2 (prot b | (prot a_Open , protocal k ) A) is provided; the probability of port b being open when host port a is open and response host response message is specific information banner is class 2 transport layer and application layer integrated featureP 3 (prot b | (prot a_Open , port banner ));
S6b, storing the three conditional probabilities in the S6a in a database in a descending order, wherein the stored format is a tuple of the probability condition and the probability result, and constructing a port prediction probability model.
6. The IPv6 network space mapping method based on industry classification and probability model of claim 1, wherein step S7 comprises the sub-steps of:
s7a, scanning 20 ports which are commonly used for all addresses remained after sampling, storing a scanning result into a database, and constructing open vector information of the 20 ports;
s7b, based on the open vector information of the commonly used 20 ports, comparing the conditional probabilities in the port prediction probability model of each industry, finding out the first 1500 ports with the maximum conditional probabilities, then performing scanning operation, storing the results into a database, and ending the current round of scanning.
7. The IPv6 network space mapping method based on industry classification and probability model of claim 1, wherein step S8 comprises the sub-steps of:
s8a, calculating three conditional probabilities in S6a according to five industries belonging to the open port data after completing one round of scanning, and updating a probability model for port prediction;
s8b, randomly sampling a batch of new address data for full-port scanning, and updating a probability model for port prediction.
8. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the industry classification and probability model based IPv6 network space mapping method according to any one of claims 1 to 7.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the industry classification and probability model based IPv6 network space mapping method of any one of claims 1 to 7 when the computer program is executed.
CN202311119847.7A 2023-09-01 2023-09-01 IPv6 network space mapping method based on industry classification and probability model Active CN116846690B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311119847.7A CN116846690B (en) 2023-09-01 2023-09-01 IPv6 network space mapping method based on industry classification and probability model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311119847.7A CN116846690B (en) 2023-09-01 2023-09-01 IPv6 network space mapping method based on industry classification and probability model

Publications (2)

Publication Number Publication Date
CN116846690A CN116846690A (en) 2023-10-03
CN116846690B true CN116846690B (en) 2023-11-03

Family

ID=88174698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311119847.7A Active CN116846690B (en) 2023-09-01 2023-09-01 IPv6 network space mapping method based on industry classification and probability model

Country Status (1)

Country Link
CN (1) CN116846690B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473512B (en) * 2023-12-28 2024-03-22 湘潭大学 Vulnerability risk assessment method based on network mapping

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112995187A (en) * 2021-03-09 2021-06-18 中国人民解放军空军工程大学 Network cooperative defense system and method based on community structure
CN114500346A (en) * 2022-04-08 2022-05-13 北京华顺信安科技有限公司 Network space mapping method and device
CN114817928A (en) * 2022-04-02 2022-07-29 安天科技集团股份有限公司 Network space data fusion analysis method and system, electronic device and storage medium
CN115296892A (en) * 2022-08-02 2022-11-04 中国电子科技集团公司信息科学研究院 Data information service system
CN115834368A (en) * 2021-11-29 2023-03-21 中国南方电网有限责任公司超高压输电公司 System for identifying network space asset information
CN116405275A (en) * 2023-03-29 2023-07-07 中国科学院沈阳自动化研究所 Attack organization dynamic identification method based on network space detection behavior

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9699209B2 (en) * 2014-12-29 2017-07-04 Cyence Inc. Cyber vulnerability scan analyses with actionable feedback
US10158654B2 (en) * 2016-10-31 2018-12-18 Acentium Inc. Systems and methods for computer environment situational awareness

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112995187A (en) * 2021-03-09 2021-06-18 中国人民解放军空军工程大学 Network cooperative defense system and method based on community structure
CN115834368A (en) * 2021-11-29 2023-03-21 中国南方电网有限责任公司超高压输电公司 System for identifying network space asset information
CN114817928A (en) * 2022-04-02 2022-07-29 安天科技集团股份有限公司 Network space data fusion analysis method and system, electronic device and storage medium
CN114500346A (en) * 2022-04-08 2022-05-13 北京华顺信安科技有限公司 Network space mapping method and device
CN115296892A (en) * 2022-08-02 2022-11-04 中国电子科技集团公司信息科学研究院 Data information service system
CN116405275A (en) * 2023-03-29 2023-07-07 中国科学院沈阳自动化研究所 Attack organization dynamic identification method based on network space detection behavior

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
网络空间测绘系统分类及应用综述;刘红 等;《信息技术与网络安全》;全文 *

Also Published As

Publication number Publication date
CN116846690A (en) 2023-10-03

Similar Documents

Publication Publication Date Title
CN112104677B (en) Controlled host detection method and device based on knowledge graph
CN110351301B (en) HTTP request double-layer progressive anomaly detection method
CN104067567B (en) System and method for carrying out spam detection using character histogram
CN116846690B (en) IPv6 network space mapping method based on industry classification and probability model
CN111818198B (en) Domain name detection method, domain name detection device, equipment and medium
US20120089620A1 (en) Extracting data
JP2019110513A (en) Anomaly detection method, learning method, anomaly detection device, and learning device
CN110830607B (en) Domain name analysis method and device and electronic equipment
CN111401063B (en) Text processing method and device based on multi-pool network and related equipment
CN111241502B (en) Cross-device user identification method and device, electronic device and storage medium
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
CN117633666A (en) Network asset identification method, device, electronic equipment and storage medium
CN111291078B (en) Domain name matching detection method and device
CN111400617B (en) Social robot detection data set extension method and system based on active learning
CN116032741A (en) Equipment identification method and device, electronic equipment and computer storage medium
CN111431884B (en) Host computer defect detection method and device based on DNS analysis
CN112039997A (en) Triple-feature-based Internet of things terminal identification method
CN110929506A (en) Junk information detection method, device and equipment and readable storage medium
CN110727743A (en) Data identification method and device, computer equipment and storage medium
CN112003884A (en) Network asset acquisition and natural language retrieval method
CN110941713A (en) Self-optimization financial information plate classification method based on topic model
CN115130110A (en) Vulnerability mining method, device, equipment and medium based on parallel ensemble learning
CN114900835A (en) Malicious traffic intelligent detection method and device and storage medium
CN115964478A (en) Network attack detection method, model training method and device, equipment and medium
CN113688240A (en) Threat element extraction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20231003

Assignee: Beijing China Silicon Union Technology Co.,Ltd.

Assignor: XIANGTAN University

Contract record no.: X2023980052552

Denomination of invention: IPv6 Network Space Mapping Method Based on Industry Classification and Probability Model

Granted publication date: 20231103

License type: Exclusive License

Record date: 20231215