CN113746947A - IPv6 active address detection method and device based on reinforcement learning - Google Patents

IPv6 active address detection method and device based on reinforcement learning Download PDF

Info

Publication number
CN113746947A
CN113746947A CN202110801982.4A CN202110801982A CN113746947A CN 113746947 A CN113746947 A CN 113746947A CN 202110801982 A CN202110801982 A CN 202110801982A CN 113746947 A CN113746947 A CN 113746947A
Authority
CN
China
Prior art keywords
density
address
addresses
active
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110801982.4A
Other languages
Chinese (zh)
Other versions
CN113746947B (en
Inventor
杨家海
宋光磊
何林
王之梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110801982.4A priority Critical patent/CN113746947B/en
Publication of CN113746947A publication Critical patent/CN113746947A/en
Application granted granted Critical
Publication of CN113746947B publication Critical patent/CN113746947B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/50Address allocation
    • H04L61/5046Resolving address allocation conflicts; Testing of addresses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2101/00Indexing scheme associated with group H04L61/00
    • H04L2101/60Types of network addresses
    • H04L2101/618Details of network addresses
    • H04L2101/659Internet protocol version 6 [IPv6] addresses

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides an IPv6 active address detection method and device based on reinforcement learning, and the method comprises the following steps: acquiring an IPv6 seed address, and determining a plurality of high-density areas of the seed address; iteratively detecting each high-density region through a pre-trained multi-arm slot machine model, comprising: generating a preset number of target addresses in each high-density area, and detecting whether each target address is an active address; and determining the number of active addresses and the number of inactive addresses in the preset number of target addresses, updating the expected reward of the corresponding high-density area according to the number of the active addresses and the number of the inactive addresses, and repeatedly executing the steps to make the density distribution of the seed addresses converge to the density distribution of the active addresses by carrying out iterative detection on each high-density area. The method enables the density distribution of the seed addresses to move towards the actual active address distribution, so that a high-density area of the active addresses can be determined in a network, and the efficiency of detecting the active addresses is improved.

Description

IPv6 active address detection method and device based on reinforcement learning
Technical Field
The application relates to the technical field of computer networks, in particular to an IPv6 active address detection method and device based on reinforcement learning.
Background
Currently, when detecting an active IPv6 address in a target area, if there is enough seed addresses in the target area, a target address generation algorithm based on the seed addresses can be generally used to detect the active IPv6 address. Assuming that the sampling distribution of the active address seeds in the target area is uniform, the density distribution of the seed addresses is consistent with the density distribution of the actual active addresses in the area, and the higher the density of the seed addresses in the target area is, the higher the probability of detecting the active addresses is. Therefore, it is desirable to find a high density area of seed addresses and perform address detection in the high density area, so as to achieve the purpose of detecting a large number of active addresses.
However, due to the influence of factors such as sampling deviation of the seed address, the density distribution of the seed address may not be consistent with the density distribution of the actual active address in the target area, so that the address detection method in the related art may perform address detection in a plurality of areas with low density of the actual active address, thereby reducing detection efficiency and wasting detection resources.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present application is to provide an IPv6 active address detection method based on reinforcement learning, where the method updates the density distribution of seed addresses according to rewards of active addresses in each iterative detection, and moves the density distribution of seed addresses to an actual active address distribution, so as to determine a high-density region of active addresses in a real network, and perform address detection in the high-density region, thereby improving efficiency of detecting active addresses and saving detection resources.
The second purpose of the invention is to provide an IPv6 active address detection device based on reinforcement learning.
A third object of the invention is to propose a non-transitory computer-readable storage medium.
In order to achieve the above object, an embodiment of the first aspect of the present invention provides a reinforcement learning-based IPv6 active address detection method, including the following steps:
s1: acquiring an IPv6 seed address, and determining a plurality of high-density areas of the seed address;
s2: detecting each of the high density regions through a pre-trained multi-arm slot machine model, comprising: generating a preset number of target addresses in each high-density area, and detecting whether each target address is an active address;
s3: determining the number of active addresses and the number of inactive addresses in the preset number of target addresses, and updating the expected reward of the corresponding high-density area according to the number of the active addresses and the number of the inactive addresses;
s4: repeatedly performing the steps S2 and S3 to converge the density distribution of the seed addresses to the density distribution of the active addresses by iteratively detecting each of the high-density regions.
Optionally, in an embodiment of the present application, the determining a plurality of high-density regions of the seed address through a density space tree, where a root node of the density space tree represents an active address space, and a leaf node of the density space tree represents a high-density region of the seed address, further includes, after step S4: and merging the leaf nodes of the density space tree to the corresponding parent nodes.
Optionally, in an embodiment of the present application, merging leaf nodes of the density space tree into corresponding parent nodes includes: performing probe address merging, reward merging and space merging on leaf nodes, wherein the space merging is performed by the following formula:
Figure BDA0003164959530000021
where f.var _ space is the variable space of the parent node, xiVar _ space is the variable space of any leaf node, j is the number of leaf nodes corresponding to the parent node.
Optionally, in an embodiment of the present application, the preset number of the target addresses is calculated by the following formula:
N(xi)=b*p(xi)
wherein,
Figure BDA0003164959530000022
wherein, N (x)i) Is any high density region xiB represents the budget consumed per iteration of probing, p (x)i) Is indicated in any of the high density regions xiOf the target address, RiRepresenting any high density region xiDesired reward of, ViRepresenting any of said high density areas xiN represents said any high density region xiThe preset detection area value.
Optionally, in one embodiment of the present application, the desired reward for the corresponding high-density zone is updated by the following formula:
Figure BDA0003164959530000023
wherein,
Figure BDA0003164959530000024
represents any high-density area x after t +1 iterationiThe desired prize, Beta represents the Beta distribution,
Figure BDA0003164959530000025
representing any high density region x determined after the t-th iterationiThe number of active addresses of the mobile terminal,
Figure BDA0003164959530000026
representing any high density region x determined after the t-th iterationiNumber of inactive addresses, α*Denotes the number of newly generated active addresses, β, after the t +1 th iteration*Representing a newly generated inactive address after the t +1 th iterationThe number of the cells.
In order to achieve the above purpose, an embodiment of the second aspect of the present application provides an apparatus for detecting an IPv6 active address based on reinforcement learning, including the following modules:
the acquisition module is used for acquiring an IPv6 seed address and determining a plurality of high-density areas of the seed address;
a detection module for detecting each of the high-density regions through a pre-trained dobby tiger machine model, the detection module being specifically configured to: generating a preset number of target addresses in each high-density area, and detecting whether each target address is an active address;
the updating module is used for determining the number of active addresses and the number of inactive addresses in the preset number of target addresses and updating the expected reward of the corresponding high-density area according to the number of the active addresses and the number of the inactive addresses;
and the iteration module is used for controlling the detection module and the updating module to repeatedly run so as to make the density distribution of the seed addresses converge to the density distribution of the active addresses by performing iteration detection on each high-density area.
Optionally, in an embodiment of the present application, the obtaining module determines a plurality of high-density regions of the seed address through a density space tree, where a root node of the density space tree represents an active address space, and a leaf node of the density space tree represents a high-density region of the seed address, and the address detection apparatus further includes:
and the merging module is used for merging the leaf nodes of the density space tree to the corresponding father nodes.
Optionally, in an embodiment of the present application, the merging module is further configured to perform probe address merging, reward merging, and space merging on the leaf nodes, where the merging module is specifically configured to perform the space merging according to the following formula:
Figure BDA0003164959530000031
where f.var _ space is the variable space of the parent node, xiVar _ space is the variable space of any leaf node, j is the number of leaf nodes corresponding to the parent node.
Optionally, in an embodiment of the present application, the detection module is further configured to calculate the preset number of the target addresses by the following formula:
N(xi)=b*p(xi)
wherein,
Figure BDA0003164959530000032
wherein, N (x)i) Is any high density region xiB represents the budget consumed per iteration of probing, p (x)i) Is indicated in any of the high density regions xiOf the target address, RiRepresenting any high density region xiDesired reward of, ViRepresenting any of said high density areas xiN represents said any high density region xiThe preset detection area value.
Optionally, in an embodiment of the present application, the updating module is specifically configured to update the desired reward of the corresponding high-density area by the following formula:
Figure BDA0003164959530000033
wherein,
Figure BDA0003164959530000034
represents any high-density area x after t +1 iterationiThe desired prize, Beta represents the Beta distribution,
Figure BDA0003164959530000035
representing any high density region x determined after the t-th iterationiThe number of active addresses of the mobile terminal,
Figure BDA0003164959530000036
representing any high density region x determined after the t-th iterationiNumber of inactive addresses, α*Denotes the number of newly generated active addresses, β, after the t +1 th iteration*Representing the number of newly generated inactive addresses after the t +1 th iteration.
The technical effects of this application: according to the method, the high-density region of the seed address is found by using the density space tree, and then each generated high-density region of the seed address is subjected to iterative detection by using a reinforcement learning method. And updating the density distribution of the seed addresses according to the rewards of the active addresses in each iterative detection, so that the density distribution of the seed addresses moves to the actual active address distribution to correct the problem of inconsistency with the active address distribution caused by sampling deviation of the seed addresses, thereby determining a high-density area of the active addresses in a real network and carrying out address detection in the high-density area, thereby improving the efficiency of detecting the active addresses and saving detection resources.
To achieve the above object, a non-transitory computer-readable storage medium is provided in an embodiment of the third aspect of the present application, and a computer program is stored on the non-transitory computer-readable storage medium, and when executed by a processor, the computer program implements the reinforcement learning based IPv6 active address detection method described in the embodiment of the first aspect of the present application.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart of an IPv6 active address probing method based on reinforcement learning according to an embodiment of the present application;
FIG. 2 is a diagram illustrating a specific IPv6 active address probing workflow provided by an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a comparison effect of a BGP prefix space algorithm provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of a Gasser data set algorithm comparison effect provided by an embodiment of the present application;
fig. 5 is a schematic structural diagram of an IPv6 active address detection apparatus based on reinforcement learning according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The method and the device for detecting the active address of the IPv6 based on reinforcement learning according to the embodiments of the present application are described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of an IPv6 active address detection method based on reinforcement learning according to an embodiment of the present application, where the method includes the following steps, as shown in fig. 1:
s1: the IPv6 seed address is obtained, and a plurality of high-density areas of the seed address are determined.
When detecting an Internet Protocol Version 6 (IPv 6) active address in a target area, the present application collects seed addresses in the target area first, and in specific implementation, each seed address in the target area may be obtained through various schemes of collecting IPv6 seed addresses in the related art, which is not described herein in detail.
Further, in an embodiment of the present application, multiple high-density regions of the obtained seed address may be found in a linear time through a density space tree, where a root node of the density space tree represents an active address space, and leaf nodes of the density space tree represent high-density regions of the seed address.
S2: detection of each high density region by a pre-trained dobby tiger machine model, comprising: a preset number of target addresses are generated in each high-density region, and it is detected whether each target address is an active address.
The method comprises the steps of training a Multi-arm slot machine model (Multi-arm bandit protocol) in advance, establishing a model in a reinforcement learning mode, dynamically updating density distribution of an active address space by using the Multi-arm slot machine model based on Thompson sampling after a plurality of high-density areas of seed addresses are determined, namely determining each implementation step and formula of active address detection of the method based on the characteristics of the trained Multi-arm slot machine model, and correcting the problem that the density distribution of the seed addresses is inconsistent with the density distribution of actual active addresses.
In one embodiment of the present application, the pre-trained dobby tiger model of the present application is specified in one possible implementation. The IPv6 address space is divided into different density areas X ═ X1,x2…,xkEach zone is a branch of a dobby slot machine, thus obtaining k actions a ═ a1,a2…,akIn which aiRefers to scanning xiAnd detecting whether it is an active address, wherein i e [1, k ]],θ={θ12,…,θkDenotes the average prize. The distribution of each arm award is a bernoulli distribution and is in θiAs parameters:
Figure BDA0003164959530000051
wherein r is a random variable, when executing aiWhen this action will result in a probability of θiIs awarded, and has a probability of 1-thetaiZero prize of, wherein thetaiWhich may be interpreted as a probability of success or an average reward for an action.
It should be noted that the model obeys an independent prior distribution, i.e. the Beta distribution, and thus takes these prior as the parameter a ═ α of the Beta distribution1,…,αk},B={β1,…,βkFor each action αkThe prior probability density function of (a) is:
Figure BDA0003164959530000052
wherein gamma is a gamma function, and the distribution is updated according to a Bayesian rule while the observation data is collected. Due to the conjugate nature of the Beta distribution, the posterior distribution of each action is also a Beta distribution with parameters, which can be updated according to the following equation (2):
Figure BDA0003164959530000053
that is, x may be selected firstiThe area is scanned for a target address, and if the target address is found to be active (reward 1), a reward will be added to the corresponding alphai(β i remains unchanged), otherwise (prize ═ 0), one will be added to the corresponding prize βiiRemain unchanged). Wherein alpha isiDenotes xiActive address of area detection, alphaiiDenotes xiThe address budget consumed by a region, thus xiThe active address hit rate of a region is alphaii. Represented by the formula, xiIs proportional to the hit rate of active addresses.
Figure BDA0003164959530000061
It should be noted that a key issue in IPv6 active address probing is achieving a high active address hit rate within a given budget. Assuming that our probing budget is B, the objective function of address probing is f, and active address probing is a combinatorial optimization problem of (X, B, f), they need to satisfy the following relations:
Figure BDA0003164959530000062
where n represents the number of regions in which the target address is generated, the target function represents the hit rate of active addresses within the target address budget B, and the target function, f, is determined by the following equation (5):
Figure BDA0003164959530000063
satisfy the above formula
Figure BDA0003164959530000064
Among the solutions of (a), the most effective is the solution x, which satisfies f (x) ≧ f (x),
Figure BDA0003164959530000065
therefore, the determined high-density area of each seed address is detected in sequence through the multi-arm slot machine model, so that each high-density area is conveniently subjected to iterative detection, the expected reward of the corresponding high-density area is updated, and the density distribution of the seed addresses is further updated.
In an embodiment of the present application, the preset number of generated target addresses may be set according to actual needs, for example, the preset number is determined according to an affordable risk degree of current address detection. As a possible implementation, the preset number of target addresses may be calculated by the following formula:
N(xi)=b*p(xi)
wherein,
Figure BDA0003164959530000066
wherein, N (x)i) Is any high density region xiB represents the budget consumed per iteration of probing, p (x)i) In any high density region xiOf the target address, RiRepresenting any high density region xiDesired reward of, ViRepresenting any high density region xiOf the address space dimension, n represents any heightDensity region xiThe preset detection area value.
Further, whether each target address is active or not may be detected and generated by detecting whether the address is an active address in the related art, which is not described herein again.
S3: and determining the number of active addresses and the number of inactive addresses in the preset number of target addresses, and updating the expected reward of the corresponding high-density area according to the number of active addresses and the number of inactive addresses.
S4: repeatedly performing the steps S2 and S3 to converge the density distribution of the seed addresses to the density distribution of the active addresses by iteratively detecting each of the high-density regions.
Specifically, in each iteration detection of each high-density area, whether each target address generated in the current high-density area is an active address or an inactive address is detected in sequence, and the number of active addresses and the number of inactive addresses in the target addresses generated by the high-density area are determined.
Further, in one embodiment of the present application, the desired reward for the corresponding high-density zone may be updated by the following formula:
Figure BDA0003164959530000071
wherein,
Figure BDA0003164959530000072
represents any high-density area x after t +1 iterationiThe desired prize, Beta represents the Beta distribution,
Figure BDA0003164959530000073
representing any high density region x determined after the t-th iterationiThe number of active addresses of the mobile terminal,
Figure BDA0003164959530000074
representing any high density region x determined after the t-th iterationiThe number of inactive addresses of the memory cell is,α*denotes the number of newly generated active addresses, β, after the t +1 th iteration*Representing the number of newly generated inactive addresses after the t +1 th iteration. It should be understood that when multiple high-density regions of seed addresses are obtained through the density space tree, any high-density region xiI.e. the leaf nodes of the density space tree.
Thus, by repeatedly performing steps S2 and S3, iterative detection is performed for each high-density region, the active address density distribution (i.e., equations (2) and (3) above) is updated by the feedback reward of each iterative scan result, and the direction of target address generation is dynamically adjusted. As the number of iterations increases, the reward probability of each action is more and more accurately evaluated, and finally a high-density area of active addresses is found in the real network, and active address generation is carried out in the high-density area.
Further, in an embodiment of the present application, after step S4, the method further includes: and merging the leaf nodes of the density space tree to the corresponding parent nodes. As a possible implementation, merging leaf nodes of a density space tree to corresponding parent nodes includes: performing detection address merging, reward merging and space merging on leaf nodes, wherein the space merging is performed through the following formula:
Figure BDA0003164959530000075
where f.var _ space is the variable space of the parent node, xiVar _ space is the variable space of any leaf node, j is the number of leaf nodes corresponding to the parent node.
In summary, according to the IPv6 active address detection method based on reinforcement learning in the embodiment of the present application, a density space tree is used to find a high-density region of a seed address, and then an reinforcement learning method is used to perform iterative detection on each generated high-density region of the seed address. And updating the density distribution of the seed addresses according to the rewards of the active addresses in each iterative detection, so that the density distribution of the seed addresses moves to the actual active address distribution to correct the problem of inconsistency with the active address distribution caused by sampling deviation of the seed addresses, thereby determining a high-density area of the active addresses in a real network and carrying out address detection in the high-density area, thereby improving the efficiency of detecting the active addresses and saving detection resources.
In order to more clearly describe the specific implementation process of the reinforcement learning-based IPv6 active address detection method of the present application, a specific embodiment is described below.
As shown in FIG. 2, in the case that the target area has enough seed addresses, an active address detection algorithm AddrMiner-S based on reinforcement learning is designed. The seed address is first collected and a high density region of the seed address is found. In each iteration, we generate a target address in the high density region and detect whether the target address is active. And dynamically updating the expected reward of the corresponding area based on the number of the active addresses and the number of the inactive addresses detected by the target address. And iteratively correcting the density distribution difference between the seed address and the actual active address caused by the sampling deviation. With the increase of the iteration number, the density distribution of the seed address gradually converges to the density distribution of the actual active address
Specifically, in one embodiment of the present application, a spatial partitioning is performed first. Specifically, first, it is found that the high density region X ═ X1,x2,…,xkThe seed address of. In order to rapidly cluster the density space distribution of the seed addresses, a density space tree is used for finding a high-density area of the seed addresses in linear time. Wherein the root node represents the entire active address space and the leaf nodes represent high density seed address regions. In each node area xiIn, there are two attributes αiAnd betai. Wherein alpha isiIs represented by xiNumber of active addresses, β, found by the regioniIs represented by xiNumber of inactive addresses found by the zone. All leaf nodes are taken out from the density space tree as a high-density region set X, and after the region of the high-density seed address is found, the active IPv6 address is dynamically detected based on reinforcement learning. Wherein subsequent reinforcement learningThe iterative process of (1) generating a target address to be detected. 2. The reward of the detection zone is updated with the number of active and inactive addresses (reward of action) to update the density profile. 3. The nodes of the spatial tree are merged to meet the need to explore a larger address space.
And secondly, generating a target address. Specifically, in order to adapt to large-scale address detection and increase the address detection speed, a plurality of target areas are selected in each iteration, and the consumption budget is b. Since node regions with greater rewards are more likely to find active addresses, in each iteration the distribution of active address density is evaluated using previous events (rewards of actions) based on the reward of generating target addresses in candidate region X. However, considering that the larger the space, the greater the risk of searching within the node area (the harder it is to find the active address). For example, in extreme cases, the hit rate of active addresses is very low in the entire IPv6 address space. To reduce the risk of inefficient address detection due to excessive space, a region address variable space (variable dimension) is used to adjust the probability that each region generates an active address. The number of target addresses generated per region is calculated as follows:
Figure BDA0003164959530000081
N(xi)=b*p(xi)
the meaning of each parameter in the formula is as described in the above embodiments, and is not described herein again.
And thirdly, updating the reward. Specifically, in order to increase the probability of generating a target address for the next round of detection in a high-density area, the reward of a node area is updated according to the detection result in the area. After each round of detection, the reward value of the detected node area needs to be updated according to the detection result. First, all leaf nodes are taken out from the density space tree as high density regions and set with bit X, and X is initializediThe reward for each leaf node in the set is as follows:
Figure BDA0003164959530000082
wherein R isiRepresenting leaf node area xiBeta denotes the Beta distribution, alphaiIs corresponding to the region xiActive address number of betaiIs corresponding to the region xiThe number of inactive addresses. At the beginning of the process,
Figure BDA0003164959530000091
is distributed in leaf node area xiThe number of seed addresses of +1,
Figure BDA0003164959530000092
after each iteration, the detection area xiThe desired rewards are updated as follows:
Figure BDA0003164959530000093
wherein alpha is*Representing a slave node area xiThe number of new active addresses, β, generated in the scan result*Representing a slave node area xiThe number of new inactive addresses generated in the scan result. Let b be*Representing the node area x in each iterationiIs the target address generated in*And beta*Satisfies the relationship of b*=α**
And fourthly, combining the nodes. In particular, the search space in a node is defined as a variable dimension of the seed address, but this will result in an incomplete search space. Furthermore, after the space search of the child node is completed, an upward combination method is adopted, and the fact that the space which is not contained in the child node is searched in the father node is guaranteed. When it is necessary to merge a leaf node area, all leaf nodes of a subtree (T) rooted at the parent node of the leaf node need to be merged to ensure that addresses continue to be generated in the high-density area. Because the leaf nodes of the T are all contained in the density area X to be searched, the sub-nodes of all the nodes can be stored in the tree building process, and all the leaf nodes can be obtained only by intersecting with the X during combination. The merging strategy of the node parameters is as follows:
(1) probe address merging active addresses (alpha) found in the parent node (f) areaf) And inactive address (beta)f) Equal to the union of the active address sets in all child nodes (C ═ x1,…,xj}). The concrete relation is as follows:
Figure BDA0003164959530000094
Figure BDA0003164959530000095
(2) the reward merging is that the reward value of the father node still satisfies the Beta distribution, and the reward is Beta (alpha)f,βf) Obtained by policy (1) from active and inactive address calculations.
(3) And (4) space combination, namely generating a space by the target address of the parent node, wherein the space is equal to the variable space of the parent node minus the variable space of the child node. The specific relationship is as follows:
Figure BDA0003164959530000096
the meaning of each parameter in the formula is as described in the above embodiments, and is not described herein again.
Therefore, iterative detection is carried out on each high-density area, active address density distribution is updated through feedback reward of each iterative scanning result, the density distribution of the seed addresses gradually converges to the density distribution of the actual active addresses along with the increase of the iteration times, and the requirement of exploring a larger address space is met through the nodes of the merged space tree.
In order to more clearly embody the beneficial effects of the reinforcement learning-based IPv6 active address probing method of the present application, an embodiment for evaluating the effects of the method in practical applications is described below.
Specifically, a routing protocol (BGP protocol is selected here) prefix (prefixes) including more than 1000 active addresses is randomly selected to perform active address detection, and the budget is set to be 10 times the number of input active addresses (seed addresses), and the active address detection efficiency is shown in fig. 3. In each BGP prefix, the reinforcement learning IPv 6-based active address probing algorithm AddrMiner-S is superior to the respective active address probing algorithms in the related art (such as DET, 6Hit and the like shown in FIG. 3), and the active address Hit rate (Hit rate) in the prefix 2001: 1291:/32 reaches 35.2%.
To further verify the universality of the algorithm, 2 million active IPv6 addresses are randomly selected as seed addresses by using Hitlist disclosed by Gasser 2021.5.8, the generation Budget (Budget) is set to 1-5 million, and the hit rate of the active addresses is shown in fig. 4. When the budget is 5 million, the detection method and the related technology have the advantages that the detection algorithm of each active address and the Hit rate of the active address are AddrMiner-S (56.9%), DET (28.9%), 6Hit (21.6%), 6Tree (12.9%), 6Gen (14.6%), and Encopy/IP (3.1%) from high to low. Therefore, it can be seen that the address detection efficiency is effectively improved by the reinforcement learning IPv 6-based active address detection algorithm AddrMiner-S.
In order to achieve the above object, as shown in fig. 5, a second embodiment of the present application proposes an apparatus for detecting an IPv6 active address based on reinforcement learning, which includes the following modules:
an obtaining module 100, configured to obtain an IPv6 seed address, and determine multiple high-density areas of the seed address;
a detection module 200 for detecting each of the high-density regions through a pre-trained dobby tiger machine model, the detection module being specifically configured to: generating a preset number of target addresses in each high-density area, and detecting whether each target address is an active address;
an updating module 300, configured to determine an active address number and an inactive address number in the preset number of target addresses, and update an expected reward of a corresponding high-density area according to the active address number and the inactive address number;
an iteration module 400, configured to control the detection module and the update module to repeatedly run so as to converge the density distribution of the seed addresses to the density distribution of the active addresses by performing iterative detection on each high-density region.
Optionally, in an embodiment of the present application, the obtaining module 100 determines a plurality of high-density regions of the seed address through a density space tree, where a root node of the density space tree represents an active address space, and a leaf node of the density space tree represents a high-density region of the seed address, and the address detecting apparatus further includes: and the merging module is used for merging the leaf nodes of the density space tree to the corresponding father nodes.
Optionally, in an embodiment of the present application, the merging module is further configured to perform probe address merging, reward merging, and space merging on the leaf nodes, where the merging module is specifically configured to perform space merging according to the following formula:
Figure BDA0003164959530000101
where f.var _ space is the variable space of the parent node, xiVar _ space is the variable space of any leaf node, and j is the number of leaf nodes to which the parent node corresponds.
Optionally, in an embodiment of the present application, the detection module 200 is further configured to calculate the preset number of target addresses by the following formula:
N(xi)=b*p(xi)
wherein,
Figure BDA0003164959530000111
wherein, N (x)i) Is any high density region xiB represents the budget consumed per iteration of probing, p (x)i) In any high density region xiOf the target address, RiRepresenting any high density region xiDesired reward of, ViRepresenting any high density region xiN represents said any high density region xiThe preset detection area value.
Optionally, in an embodiment of the present application, the updating module 300 is specifically configured to update the desired reward of the corresponding high-density area by the following formula:
Figure BDA0003164959530000112
wherein,
Figure BDA0003164959530000113
represents any high-density area x after t +1 iterationiThe desired prize, Beta represents the Beta distribution,
Figure BDA0003164959530000114
representing any high density region x determined after the t-th iterationiThe number of active addresses of the mobile terminal,
Figure BDA0003164959530000115
representing any high density region x determined after the t-th iterationiNumber of inactive addresses, α*Denotes the number of newly generated active addresses, β, after the t +1 th iteration*Representing the number of newly generated inactive addresses after the t +1 th iteration.
In summary, according to the IPv6 active address detection apparatus based on reinforcement learning in the embodiment of the present application, a density space tree is used to find a high density region of a seed address, and then an reinforcement learning method is used to perform iterative detection on each generated high density region of the seed address. And updating the density distribution of the seed addresses according to the rewards of the active addresses in each iterative detection, so that the density distribution of the seed addresses moves to the actual active address distribution to correct the problem of inconsistency with the active address distribution caused by sampling deviation of the seed addresses, thereby determining a high-density area of the active addresses in a real network and carrying out address detection in the high-density area, thereby improving the efficiency of detecting the active addresses and saving detection resources.
In order to implement the foregoing embodiments, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements an reinforcement learning based IPv6 active address detection method according to the embodiment of the first aspect of the present application.
Although the present application has been disclosed in detail with reference to the accompanying drawings, it is to be understood that such description is merely illustrative and not restrictive of the application of the present application. The scope of the present application is defined by the appended claims and may include various modifications, adaptations, and equivalents of the invention without departing from the scope and spirit of the application.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. An IPv6 active address detection method based on reinforcement learning is characterized by comprising the following steps:
s1: acquiring an IPv6 seed address, and determining a plurality of high-density areas of the seed address;
s2: detecting each of the high density regions through a pre-trained multi-arm slot machine model, comprising: generating a preset number of target addresses in each high-density area, and detecting whether each target address is an active address;
s3: determining the number of active addresses and the number of inactive addresses in the preset number of target addresses, and updating the expected reward of the corresponding high-density area according to the number of the active addresses and the number of the inactive addresses;
s4: repeatedly performing the steps S2 and S3 to converge the density distribution of the seed addresses to the density distribution of the active addresses by iteratively detecting each of the high-density regions.
2. The address detection method according to claim 1, wherein a plurality of high-density regions of the seed address are determined by a density space tree, wherein a root node of the density space tree represents an active address space, and a leaf node of the density space tree represents a high-density region of the seed address, and after step S4, the method further comprises:
and merging the leaf nodes of the density space tree to the corresponding parent nodes.
3. The address detection method of claim 2, wherein merging leaf nodes of the density space tree into corresponding parent nodes comprises: performing probe address merging, reward merging and space merging on leaf nodes, wherein the space merging is performed by the following formula:
Figure FDA0003164959520000011
where f.var _ space is the variable space of the parent node, xiVar _ space is the variable space of any leaf node, j is the number of leaf nodes corresponding to the parent node.
4. The address detection method according to claim 1 or 2, wherein the preset number of target addresses is calculated by the following formula:
N(xi)=b*p(xi)
wherein,
Figure FDA0003164959520000012
wherein, N (x)i) Is any high density region xiB represents the budget consumed per iteration of probing, p (x)i) Is indicated in any of the high density regions xiOf the target address, RiRepresenting any high density region xiDesired reward of, ViRepresenting any of said high density areas xiN represents said any high density region xiThe preset detection area value.
5. The address detection method of claim 1, updating the expected reward of the corresponding high-density area by the following formula:
Figure FDA0003164959520000021
wherein,
Figure FDA0003164959520000022
represents any high-density area x after t +1 iterationiThe desired prize, Beta represents the Beta distribution,
Figure FDA0003164959520000023
representing any high density region x determined after the t-th iterationiThe number of active addresses of the mobile terminal,
Figure FDA0003164959520000024
representing any high density region x determined after the t-th iterationiNumber of inactive addresses, α*Denotes the number of newly generated active addresses, β, after the t +1 th iteration*Representing the number of newly generated inactive addresses after the t +1 th iteration.
6. An apparatus for detecting IPv6 active address based on reinforcement learning, comprising:
the acquisition module is used for acquiring an IPv6 seed address and determining a plurality of high-density areas of the seed address;
a detection module for detecting each of the high-density regions through a pre-trained dobby tiger machine model, the detection module being specifically configured to: generating a preset number of target addresses in each high-density area, and detecting whether each target address is an active address;
the updating module is used for determining the number of active addresses and the number of inactive addresses in the preset number of target addresses and updating the expected reward of the corresponding high-density area according to the number of the active addresses and the number of the inactive addresses;
and the iteration module is used for controlling the detection module and the updating module to repeatedly run so as to make the density distribution of the seed addresses converge to the density distribution of the active addresses by performing iteration detection on each high-density area.
7. The address detection apparatus of claim 6, wherein the obtaining module determines a plurality of high-density regions of the seed address through a density space tree, wherein a root node of the density space tree represents an active address space and leaf nodes of the density space tree represent high-density regions of the seed address, the address detection apparatus further comprising:
and the merging module is used for merging the leaf nodes of the density space tree to the corresponding father nodes.
8. The address detection apparatus according to claim 7, wherein the merging module is further configured to perform a detection address merging, a reward merging, and a space merging on leaf nodes, wherein the merging module is specifically configured to perform the space merging according to the following formula:
Figure FDA0003164959520000025
where f.var _ space is the variable space of the parent node, xiVar _ space is the variable space of any leaf node, j is the number of leaf nodes corresponding to the parent node.
9. The address detection apparatus of claim 7, wherein the detection module is further configured to calculate the preset number of target addresses by the following formula:
N(xi)=b*p(xi)
wherein,
Figure FDA0003164959520000026
wherein, N (x)i) Is any high density region xiB represents the budget consumed per iteration of probing, p (x)i) Is indicated in any of the high density regions xiOf the target address, RiRepresenting any high density region xiDesired reward of, ViRepresenting any of said high density areas xiN represents said any high density region xiThe preset detection area value.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the reinforcement learning based IPv6 active address probing method according to any one of claims 1-5.
CN202110801982.4A 2021-07-15 2021-07-15 IPv6 active address detection method and device based on reinforcement learning Active CN113746947B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110801982.4A CN113746947B (en) 2021-07-15 2021-07-15 IPv6 active address detection method and device based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110801982.4A CN113746947B (en) 2021-07-15 2021-07-15 IPv6 active address detection method and device based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN113746947A true CN113746947A (en) 2021-12-03
CN113746947B CN113746947B (en) 2022-05-06

Family

ID=78728654

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110801982.4A Active CN113746947B (en) 2021-07-15 2021-07-15 IPv6 active address detection method and device based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN113746947B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114157637A (en) * 2022-02-09 2022-03-08 中国人民解放军国防科技大学 IPv6 address scanning method and device, computer equipment and storage medium
CN115208800A (en) * 2022-09-16 2022-10-18 清华大学 Whole internet port scanning method and device based on reinforcement learning
CN115297036A (en) * 2022-08-12 2022-11-04 北京华顺信安科技有限公司 IPv6 address intelligent analysis-based network space map drawing method and system
CN118381821A (en) * 2024-06-26 2024-07-23 中国人民解放军国防科技大学 IPv6 network edge detection method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109039797A (en) * 2018-06-11 2018-12-18 电子科技大学 Big stream detection method based on intensified learning
CN109905497A (en) * 2019-03-05 2019-06-18 长沙学院 A kind of IPv6 active address Dynamic Discovery method
CN111432043A (en) * 2020-03-09 2020-07-17 清华大学 Dynamic IPv6 address detection method based on density
CN112398969A (en) * 2021-01-19 2021-02-23 中国人民解放军国防科技大学 IPv6 address dynamic detection method and device and computer equipment
CN112653764A (en) * 2020-12-24 2021-04-13 清华大学 IPv6 service detection method and system, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109039797A (en) * 2018-06-11 2018-12-18 电子科技大学 Big stream detection method based on intensified learning
CN109905497A (en) * 2019-03-05 2019-06-18 长沙学院 A kind of IPv6 active address Dynamic Discovery method
CN111432043A (en) * 2020-03-09 2020-07-17 清华大学 Dynamic IPv6 address detection method based on density
CN112653764A (en) * 2020-12-24 2021-04-13 清华大学 IPv6 service detection method and system, electronic equipment and storage medium
CN112398969A (en) * 2021-01-19 2021-02-23 中国人民解放军国防科技大学 IPv6 address dynamic detection method and device and computer equipment

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
CHEN SU,等: "A study on the distribution of active IPv6 addresses used by websites", 《2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC)》 *
GUANGLEI SONG等: "Towards the Construction of Global IPv6 Hitlist and Efficient Probing of IPv6 Address Space", 《2020 IEEE/ACM 28TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS)》 *
余力,等: "基于强化学习的推荐研究综述", 《计算机科学》 *
左志昊等: "活跃IPv6地址前缀的预测算法", 《通信学报》 *
李果;等: "基于多层级分类和空间建模的IPv6活跃地址发现算法", 《清华大学学报(自然科学版)》 *
李果等: "基于种子地址的IPv6地址探测技术综述", 《电信科学》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114157637A (en) * 2022-02-09 2022-03-08 中国人民解放军国防科技大学 IPv6 address scanning method and device, computer equipment and storage medium
CN114157637B (en) * 2022-02-09 2022-04-22 中国人民解放军国防科技大学 IPv6 address scanning method and device, computer equipment and storage medium
CN115297036A (en) * 2022-08-12 2022-11-04 北京华顺信安科技有限公司 IPv6 address intelligent analysis-based network space map drawing method and system
CN115297036B (en) * 2022-08-12 2023-09-05 北京华顺信安科技有限公司 IPv6 address intelligent analysis-based network space map drawing method and system
CN115208800A (en) * 2022-09-16 2022-10-18 清华大学 Whole internet port scanning method and device based on reinforcement learning
CN115208800B (en) * 2022-09-16 2023-01-03 清华大学 Whole internet port scanning method and device based on reinforcement learning
CN118381821A (en) * 2024-06-26 2024-07-23 中国人民解放军国防科技大学 IPv6 network edge detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113746947B (en) 2022-05-06

Similar Documents

Publication Publication Date Title
CN113746947B (en) IPv6 active address detection method and device based on reinforcement learning
Santiago et al. A novel multi-objective evolutionary algorithm with fuzzy logic based adaptive selection of operators: FAME
Bitsakos et al. DERP: A deep reinforcement learning cloud system for elastic resource provisioning
Balaprakash et al. Improvement strategies for the F-Race algorithm: Sampling design and iterative refinement
Manju et al. An analysis of Q-learning algorithms with strategies of reward function
Kozak et al. Collective data mining in the ant colony decision tree approach
Hong et al. Confidence-conditioned value functions for offline reinforcement learning
CN111144581A (en) Machine learning hyper-parameter adjusting method and system
Anagnostopoulos et al. Community detection on evolving graphs
CN111291854A (en) Artificial bee colony algorithm optimization method based on multiple improved strategies
Medo Statistical validation of high-dimensional models of growing networks
CN107169594B (en) Optimization method and device for vehicle path problem
CN114281690A (en) Method for grouping fuzzy test of software
Anand et al. Oga-uct: On-the-go abstractions in uct
Rosenberg et al. Planning and learning with adaptive lookahead
Wang et al. An on-line planner for pomdps with large discrete action space: A quantile-based approach
Meyer Convergence control in ACO
Li et al. Path planning of mobile robot based on dynamic chaotic ant colony optimization algorithm
CN112364526A (en) Fuzzy batch scheduling method and system based on fruit fly algorithm
CN115391172A (en) Input structure inference method and device based on particle swarm optimization algorithm
Tran Elitist non-dominated sorting GA-II (NSGA-II) as a parameter-less multi-objective genetic algorithm
CN113064674A (en) Method and device for expanding state machine logic, storage medium and electronic device
Wu et al. Solving large-scale and sparse-reward dec-pomdps with correlation-mdps
Chen et al. C 2: Co-design of Robots via Concurrent-Network Coupling Online and Offline Reinforcement Learning
CN108683599A (en) One kind determining method and system based on pretreated distribution network max-flow

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant