CN113746947A

CN113746947A - IPv6 active address detection method and device based on reinforcement learning

Info

Publication number: CN113746947A
Application number: CN202110801982.4A
Authority: CN
Inventors: 杨家海; 宋光磊; 何林; 王之梁
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2021-12-03
Anticipated expiration: 2041-07-15
Also published as: CN113746947B

Abstract

The application provides an IPv6 active address detection method and device based on reinforcement learning, and the method comprises the following steps: acquiring an IPv6 seed address, and determining a plurality of high-density areas of the seed address; iteratively detecting each high-density region through a pre-trained multi-arm slot machine model, comprising: generating a preset number of target addresses in each high-density area, and detecting whether each target address is an active address; and determining the number of active addresses and the number of inactive addresses in the preset number of target addresses, updating the expected reward of the corresponding high-density area according to the number of the active addresses and the number of the inactive addresses, and repeatedly executing the steps to make the density distribution of the seed addresses converge to the density distribution of the active addresses by carrying out iterative detection on each high-density area. The method enables the density distribution of the seed addresses to move towards the actual active address distribution, so that a high-density area of the active addresses can be determined in a network, and the efficiency of detecting the active addresses is improved.

Description

IPv6 active address detection method and device based on reinforcement learning

Technical Field

The application relates to the technical field of computer networks, in particular to an IPv6 active address detection method and device based on reinforcement learning.

Background

Currently, when detecting an active IPv6 address in a target area, if there is enough seed addresses in the target area, a target address generation algorithm based on the seed addresses can be generally used to detect the active IPv6 address. Assuming that the sampling distribution of the active address seeds in the target area is uniform, the density distribution of the seed addresses is consistent with the density distribution of the actual active addresses in the area, and the higher the density of the seed addresses in the target area is, the higher the probability of detecting the active addresses is. Therefore, it is desirable to find a high density area of seed addresses and perform address detection in the high density area, so as to achieve the purpose of detecting a large number of active addresses.

However, due to the influence of factors such as sampling deviation of the seed address, the density distribution of the seed address may not be consistent with the density distribution of the actual active address in the target area, so that the address detection method in the related art may perform address detection in a plurality of areas with low density of the actual active address, thereby reducing detection efficiency and wasting detection resources.

Disclosure of Invention

The present application is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, a first objective of the present application is to provide an IPv6 active address detection method based on reinforcement learning, where the method updates the density distribution of seed addresses according to rewards of active addresses in each iterative detection, and moves the density distribution of seed addresses to an actual active address distribution, so as to determine a high-density region of active addresses in a real network, and perform address detection in the high-density region, thereby improving efficiency of detecting active addresses and saving detection resources.

The second purpose of the invention is to provide an IPv6 active address detection device based on reinforcement learning.

A third object of the invention is to propose a non-transitory computer-readable storage medium.

In order to achieve the above object, an embodiment of the first aspect of the present invention provides a reinforcement learning-based IPv6 active address detection method, including the following steps:

s1: acquiring an IPv6 seed address, and determining a plurality of high-density areas of the seed address;

s2: detecting each of the high density regions through a pre-trained multi-arm slot machine model, comprising: generating a preset number of target addresses in each high-density area, and detecting whether each target address is an active address;

s3: determining the number of active addresses and the number of inactive addresses in the preset number of target addresses, and updating the expected reward of the corresponding high-density area according to the number of the active addresses and the number of the inactive addresses;

s4: repeatedly performing the steps S2 and S3 to converge the density distribution of the seed addresses to the density distribution of the active addresses by iteratively detecting each of the high-density regions.

Optionally, in an embodiment of the present application, the determining a plurality of high-density regions of the seed address through a density space tree, where a root node of the density space tree represents an active address space, and a leaf node of the density space tree represents a high-density region of the seed address, further includes, after step S4: and merging the leaf nodes of the density space tree to the corresponding parent nodes.

Optionally, in an embodiment of the present application, merging leaf nodes of the density space tree into corresponding parent nodes includes: performing probe address merging, reward merging and space merging on leaf nodes, wherein the space merging is performed by the following formula:

where f.var _ space is the variable space of the parent node, x_iVar _ space is the variable space of any leaf node, j is the number of leaf nodes corresponding to the parent node.

Optionally, in an embodiment of the present application, the preset number of the target addresses is calculated by the following formula:

N(x_i)＝b*p(x_i)

wherein,

wherein, N (x)_i) Is any high density region x_iB represents the budget consumed per iteration of probing, p (x)_i) Is indicated in any of the high density regions x_iOf the target address, R_iRepresenting any high density region x_iDesired reward of, V_iRepresenting any of said high density areas x_iN represents said any high density region x_iThe preset detection area value.

Optionally, in one embodiment of the present application, the desired reward for the corresponding high-density zone is updated by the following formula:

wherein,

represents any high-density area x after t +1 iteration_iThe desired prize, Beta represents the Beta distribution,

representing any high density region x determined after the t-th iteration_iThe number of active addresses of the mobile terminal,

representing any high density region x determined after the t-th iteration_iNumber of inactive addresses, α^*Denotes the number of newly generated active addresses, β, after the t +1 th iteration^*Representing a newly generated inactive address after the t +1 th iterationThe number of the cells.

In order to achieve the above purpose, an embodiment of the second aspect of the present application provides an apparatus for detecting an IPv6 active address based on reinforcement learning, including the following modules:

the acquisition module is used for acquiring an IPv6 seed address and determining a plurality of high-density areas of the seed address;

a detection module for detecting each of the high-density regions through a pre-trained dobby tiger machine model, the detection module being specifically configured to: generating a preset number of target addresses in each high-density area, and detecting whether each target address is an active address;

the updating module is used for determining the number of active addresses and the number of inactive addresses in the preset number of target addresses and updating the expected reward of the corresponding high-density area according to the number of the active addresses and the number of the inactive addresses;

and the iteration module is used for controlling the detection module and the updating module to repeatedly run so as to make the density distribution of the seed addresses converge to the density distribution of the active addresses by performing iteration detection on each high-density area.

Optionally, in an embodiment of the present application, the obtaining module determines a plurality of high-density regions of the seed address through a density space tree, where a root node of the density space tree represents an active address space, and a leaf node of the density space tree represents a high-density region of the seed address, and the address detection apparatus further includes:

and the merging module is used for merging the leaf nodes of the density space tree to the corresponding father nodes.

Optionally, in an embodiment of the present application, the merging module is further configured to perform probe address merging, reward merging, and space merging on the leaf nodes, where the merging module is specifically configured to perform the space merging according to the following formula:

Optionally, in an embodiment of the present application, the detection module is further configured to calculate the preset number of the target addresses by the following formula:

N(x_i)＝b*p(x_i)

wherein,

Optionally, in an embodiment of the present application, the updating module is specifically configured to update the desired reward of the corresponding high-density area by the following formula:

wherein,

representing any high density region x determined after the t-th iteration_iNumber of inactive addresses, α^*Denotes the number of newly generated active addresses, β, after the t +1 th iteration^*Representing the number of newly generated inactive addresses after the t +1 th iteration.

The technical effects of this application: according to the method, the high-density region of the seed address is found by using the density space tree, and then each generated high-density region of the seed address is subjected to iterative detection by using a reinforcement learning method. And updating the density distribution of the seed addresses according to the rewards of the active addresses in each iterative detection, so that the density distribution of the seed addresses moves to the actual active address distribution to correct the problem of inconsistency with the active address distribution caused by sampling deviation of the seed addresses, thereby determining a high-density area of the active addresses in a real network and carrying out address detection in the high-density area, thereby improving the efficiency of detecting the active addresses and saving detection resources.

To achieve the above object, a non-transitory computer-readable storage medium is provided in an embodiment of the third aspect of the present application, and a computer program is stored on the non-transitory computer-readable storage medium, and when executed by a processor, the computer program implements the reinforcement learning based IPv6 active address detection method described in the embodiment of the first aspect of the present application.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flowchart of an IPv6 active address probing method based on reinforcement learning according to an embodiment of the present application;

FIG. 2 is a diagram illustrating a specific IPv6 active address probing workflow provided by an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a comparison effect of a BGP prefix space algorithm provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a Gasser data set algorithm comparison effect provided by an embodiment of the present application;

fig. 5 is a schematic structural diagram of an IPv6 active address detection apparatus based on reinforcement learning according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The method and the device for detecting the active address of the IPv6 based on reinforcement learning according to the embodiments of the present application are described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of an IPv6 active address detection method based on reinforcement learning according to an embodiment of the present application, where the method includes the following steps, as shown in fig. 1:

s1: the IPv6 seed address is obtained, and a plurality of high-density areas of the seed address are determined.

When detecting an Internet Protocol Version 6 (IPv 6) active address in a target area, the present application collects seed addresses in the target area first, and in specific implementation, each seed address in the target area may be obtained through various schemes of collecting IPv6 seed addresses in the related art, which is not described herein in detail.

Further, in an embodiment of the present application, multiple high-density regions of the obtained seed address may be found in a linear time through a density space tree, where a root node of the density space tree represents an active address space, and leaf nodes of the density space tree represent high-density regions of the seed address.

S2: detection of each high density region by a pre-trained dobby tiger machine model, comprising: a preset number of target addresses are generated in each high-density region, and it is detected whether each target address is an active address.

The method comprises the steps of training a Multi-arm slot machine model (Multi-arm bandit protocol) in advance, establishing a model in a reinforcement learning mode, dynamically updating density distribution of an active address space by using the Multi-arm slot machine model based on Thompson sampling after a plurality of high-density areas of seed addresses are determined, namely determining each implementation step and formula of active address detection of the method based on the characteristics of the trained Multi-arm slot machine model, and correcting the problem that the density distribution of the seed addresses is inconsistent with the density distribution of actual active addresses.

In one embodiment of the present application, the pre-trained dobby tiger model of the present application is specified in one possible implementation. The IPv6 address space is divided into different density areas X ═ X₁,x₂…,x_kEach zone is a branch of a dobby slot machine, thus obtaining k actions a ═ a₁,a₂…,a_kIn which a_iRefers to scanning x_iAnd detecting whether it is an active address, wherein i e [1, k ]]，θ＝{θ₁,θ₂,…,θ_kDenotes the average prize. The distribution of each arm award is a bernoulli distribution and is in θ_iAs parameters:

wherein r is a random variable, when executing a_iWhen this action will result in a probability of θ_iIs awarded, and has a probability of 1-theta_iZero prize of, wherein theta_iWhich may be interpreted as a probability of success or an average reward for an action.

It should be noted that the model obeys an independent prior distribution, i.e. the Beta distribution, and thus takes these prior as the parameter a ═ α of the Beta distribution₁,…,α_k}，B＝{β₁,…,β_kFor each action α_kThe prior probability density function of (a) is:

wherein gamma is a gamma function, and the distribution is updated according to a Bayesian rule while the observation data is collected. Due to the conjugate nature of the Beta distribution, the posterior distribution of each action is also a Beta distribution with parameters, which can be updated according to the following equation (2):

that is, x may be selected first_iThe area is scanned for a target address, and if the target address is found to be active (reward 1), a reward will be added to the corresponding alpha_i(β i remains unchanged), otherwise (prize ═ 0), one will be added to the corresponding prize β_i(α_iRemain unchanged). Wherein alpha is_iDenotes x_iActive address of area detection, alpha_i+β_iDenotes x_iThe address budget consumed by a region, thus x_iThe active address hit rate of a region is alpha_i+β_i. Represented by the formula, x_iIs proportional to the hit rate of active addresses.

It should be noted that a key issue in IPv6 active address probing is achieving a high active address hit rate within a given budget. Assuming that our probing budget is B, the objective function of address probing is f, and active address probing is a combinatorial optimization problem of (X, B, f), they need to satisfy the following relations:

where n represents the number of regions in which the target address is generated, the target function represents the hit rate of active addresses within the target address budget B, and the target function, f, is determined by the following equation (5):

satisfy the above formula

Among the solutions of (a), the most effective is the solution x, which satisfies f (x) ≧ f (x),

therefore, the determined high-density area of each seed address is detected in sequence through the multi-arm slot machine model, so that each high-density area is conveniently subjected to iterative detection, the expected reward of the corresponding high-density area is updated, and the density distribution of the seed addresses is further updated.

In an embodiment of the present application, the preset number of generated target addresses may be set according to actual needs, for example, the preset number is determined according to an affordable risk degree of current address detection. As a possible implementation, the preset number of target addresses may be calculated by the following formula:

N(x_i)＝b*p(x_i)

wherein,

wherein, N (x)_i) Is any high density region x_iB represents the budget consumed per iteration of probing, p (x)_i) In any high density region x_iOf the target address, R_iRepresenting any high density region x_iDesired reward of, V_iRepresenting any high density region x_iOf the address space dimension, n represents any heightDensity region x_iThe preset detection area value.

Further, whether each target address is active or not may be detected and generated by detecting whether the address is an active address in the related art, which is not described herein again.

S3: and determining the number of active addresses and the number of inactive addresses in the preset number of target addresses, and updating the expected reward of the corresponding high-density area according to the number of active addresses and the number of inactive addresses.

Specifically, in each iteration detection of each high-density area, whether each target address generated in the current high-density area is an active address or an inactive address is detected in sequence, and the number of active addresses and the number of inactive addresses in the target addresses generated by the high-density area are determined.

Further, in one embodiment of the present application, the desired reward for the corresponding high-density zone may be updated by the following formula:

wherein,

representing any high density region x determined after the t-th iteration_iThe number of inactive addresses of the memory cell is,α^*denotes the number of newly generated active addresses, β, after the t +1 th iteration^*Representing the number of newly generated inactive addresses after the t +1 th iteration. It should be understood that when multiple high-density regions of seed addresses are obtained through the density space tree, any high-density region x_iI.e. the leaf nodes of the density space tree.

Thus, by repeatedly performing steps S2 and S3, iterative detection is performed for each high-density region, the active address density distribution (i.e., equations (2) and (3) above) is updated by the feedback reward of each iterative scan result, and the direction of target address generation is dynamically adjusted. As the number of iterations increases, the reward probability of each action is more and more accurately evaluated, and finally a high-density area of active addresses is found in the real network, and active address generation is carried out in the high-density area.

Further, in an embodiment of the present application, after step S4, the method further includes: and merging the leaf nodes of the density space tree to the corresponding parent nodes. As a possible implementation, merging leaf nodes of a density space tree to corresponding parent nodes includes: performing detection address merging, reward merging and space merging on leaf nodes, wherein the space merging is performed through the following formula:

In summary, according to the IPv6 active address detection method based on reinforcement learning in the embodiment of the present application, a density space tree is used to find a high-density region of a seed address, and then an reinforcement learning method is used to perform iterative detection on each generated high-density region of the seed address. And updating the density distribution of the seed addresses according to the rewards of the active addresses in each iterative detection, so that the density distribution of the seed addresses moves to the actual active address distribution to correct the problem of inconsistency with the active address distribution caused by sampling deviation of the seed addresses, thereby determining a high-density area of the active addresses in a real network and carrying out address detection in the high-density area, thereby improving the efficiency of detecting the active addresses and saving detection resources.

In order to more clearly describe the specific implementation process of the reinforcement learning-based IPv6 active address detection method of the present application, a specific embodiment is described below.

As shown in FIG. 2, in the case that the target area has enough seed addresses, an active address detection algorithm AddrMiner-S based on reinforcement learning is designed. The seed address is first collected and a high density region of the seed address is found. In each iteration, we generate a target address in the high density region and detect whether the target address is active. And dynamically updating the expected reward of the corresponding area based on the number of the active addresses and the number of the inactive addresses detected by the target address. And iteratively correcting the density distribution difference between the seed address and the actual active address caused by the sampling deviation. With the increase of the iteration number, the density distribution of the seed address gradually converges to the density distribution of the actual active address

Specifically, in one embodiment of the present application, a spatial partitioning is performed first. Specifically, first, it is found that the high density region X ═ X₁，x₂，…，x_kThe seed address of. In order to rapidly cluster the density space distribution of the seed addresses, a density space tree is used for finding a high-density area of the seed addresses in linear time. Wherein the root node represents the entire active address space and the leaf nodes represent high density seed address regions. In each node area x_iIn, there are two attributes α_iAnd beta_i. Wherein alpha is_iIs represented by x_iNumber of active addresses, β, found by the region_iIs represented by x_iNumber of inactive addresses found by the zone. All leaf nodes are taken out from the density space tree as a high-density region set X, and after the region of the high-density seed address is found, the active IPv6 address is dynamically detected based on reinforcement learning. Wherein subsequent reinforcement learningThe iterative process of (1) generating a target address to be detected. 2. The reward of the detection zone is updated with the number of active and inactive addresses (reward of action) to update the density profile. 3. The nodes of the spatial tree are merged to meet the need to explore a larger address space.

And secondly, generating a target address. Specifically, in order to adapt to large-scale address detection and increase the address detection speed, a plurality of target areas are selected in each iteration, and the consumption budget is b. Since node regions with greater rewards are more likely to find active addresses, in each iteration the distribution of active address density is evaluated using previous events (rewards of actions) based on the reward of generating target addresses in candidate region X. However, considering that the larger the space, the greater the risk of searching within the node area (the harder it is to find the active address). For example, in extreme cases, the hit rate of active addresses is very low in the entire IPv6 address space. To reduce the risk of inefficient address detection due to excessive space, a region address variable space (variable dimension) is used to adjust the probability that each region generates an active address. The number of target addresses generated per region is calculated as follows:

N(x_i)＝b*p(x_i)

the meaning of each parameter in the formula is as described in the above embodiments, and is not described herein again.

And thirdly, updating the reward. Specifically, in order to increase the probability of generating a target address for the next round of detection in a high-density area, the reward of a node area is updated according to the detection result in the area. After each round of detection, the reward value of the detected node area needs to be updated according to the detection result. First, all leaf nodes are taken out from the density space tree as high density regions and set with bit X, and X is initialized_iThe reward for each leaf node in the set is as follows:

wherein R is_iRepresenting leaf node area x_iBeta denotes the Beta distribution, alpha_iIs corresponding to the region x_iActive address number of beta_iIs corresponding to the region x_iThe number of inactive addresses. At the beginning of the process,

is distributed in leaf node area x_iThe number of seed addresses of +1,

after each iteration, the detection area x_iThe desired rewards are updated as follows:

wherein alpha is^*Representing a slave node area x_iThe number of new active addresses, β, generated in the scan result^*Representing a slave node area x_iThe number of new inactive addresses generated in the scan result. Let b be^*Representing the node area x in each iteration_iIs the target address generated in^*And beta^*Satisfies the relationship of b^*＝α^*+β^*。

And fourthly, combining the nodes. In particular, the search space in a node is defined as a variable dimension of the seed address, but this will result in an incomplete search space. Furthermore, after the space search of the child node is completed, an upward combination method is adopted, and the fact that the space which is not contained in the child node is searched in the father node is guaranteed. When it is necessary to merge a leaf node area, all leaf nodes of a subtree (T) rooted at the parent node of the leaf node need to be merged to ensure that addresses continue to be generated in the high-density area. Because the leaf nodes of the T are all contained in the density area X to be searched, the sub-nodes of all the nodes can be stored in the tree building process, and all the leaf nodes can be obtained only by intersecting with the X during combination. The merging strategy of the node parameters is as follows:

(1) probe address merging active addresses (alpha) found in the parent node (f) area_f) And inactive address (beta)_f) Equal to the union of the active address sets in all child nodes (C ═ x₁，…,x_j}). The concrete relation is as follows:

(2) the reward merging is that the reward value of the father node still satisfies the Beta distribution, and the reward is Beta (alpha)_f，β_f) Obtained by policy (1) from active and inactive address calculations.

(3) And (4) space combination, namely generating a space by the target address of the parent node, wherein the space is equal to the variable space of the parent node minus the variable space of the child node. The specific relationship is as follows:

Therefore, iterative detection is carried out on each high-density area, active address density distribution is updated through feedback reward of each iterative scanning result, the density distribution of the seed addresses gradually converges to the density distribution of the actual active addresses along with the increase of the iteration times, and the requirement of exploring a larger address space is met through the nodes of the merged space tree.

In order to more clearly embody the beneficial effects of the reinforcement learning-based IPv6 active address probing method of the present application, an embodiment for evaluating the effects of the method in practical applications is described below.

Specifically, a routing protocol (BGP protocol is selected here) prefix (prefixes) including more than 1000 active addresses is randomly selected to perform active address detection, and the budget is set to be 10 times the number of input active addresses (seed addresses), and the active address detection efficiency is shown in fig. 3. In each BGP prefix, the reinforcement learning IPv 6-based active address probing algorithm AddrMiner-S is superior to the respective active address probing algorithms in the related art (such as DET, 6Hit and the like shown in FIG. 3), and the active address Hit rate (Hit rate) in the prefix 2001: 1291:/32 reaches 35.2%.

To further verify the universality of the algorithm, 2 million active IPv6 addresses are randomly selected as seed addresses by using Hitlist disclosed by Gasser 2021.5.8, the generation Budget (Budget) is set to 1-5 million, and the hit rate of the active addresses is shown in fig. 4. When the budget is 5 million, the detection method and the related technology have the advantages that the detection algorithm of each active address and the Hit rate of the active address are AddrMiner-S (56.9%), DET (28.9%), 6Hit (21.6%), 6Tree (12.9%), 6Gen (14.6%), and Encopy/IP (3.1%) from high to low. Therefore, it can be seen that the address detection efficiency is effectively improved by the reinforcement learning IPv 6-based active address detection algorithm AddrMiner-S.

In order to achieve the above object, as shown in fig. 5, a second embodiment of the present application proposes an apparatus for detecting an IPv6 active address based on reinforcement learning, which includes the following modules:

an obtaining module 100, configured to obtain an IPv6 seed address, and determine multiple high-density areas of the seed address;

a detection module 200 for detecting each of the high-density regions through a pre-trained dobby tiger machine model, the detection module being specifically configured to: generating a preset number of target addresses in each high-density area, and detecting whether each target address is an active address;

an updating module 300, configured to determine an active address number and an inactive address number in the preset number of target addresses, and update an expected reward of a corresponding high-density area according to the active address number and the inactive address number;

an iteration module 400, configured to control the detection module and the update module to repeatedly run so as to converge the density distribution of the seed addresses to the density distribution of the active addresses by performing iterative detection on each high-density region.

Optionally, in an embodiment of the present application, the obtaining module 100 determines a plurality of high-density regions of the seed address through a density space tree, where a root node of the density space tree represents an active address space, and a leaf node of the density space tree represents a high-density region of the seed address, and the address detecting apparatus further includes: and the merging module is used for merging the leaf nodes of the density space tree to the corresponding father nodes.

Optionally, in an embodiment of the present application, the merging module is further configured to perform probe address merging, reward merging, and space merging on the leaf nodes, where the merging module is specifically configured to perform space merging according to the following formula:

where f.var _ space is the variable space of the parent node, x_iVar _ space is the variable space of any leaf node, and j is the number of leaf nodes to which the parent node corresponds.

Optionally, in an embodiment of the present application, the detection module 200 is further configured to calculate the preset number of target addresses by the following formula:

N(x_i)＝b*p(x_i)

wherein,

wherein, N (x)_i) Is any high density region x_iB represents the budget consumed per iteration of probing, p (x)_i) In any high density region x_iOf the target address, R_iRepresenting any high density region x_iDesired reward of, V_iRepresenting any high density region x_iN represents said any high density region x_iThe preset detection area value.

Optionally, in an embodiment of the present application, the updating module 300 is specifically configured to update the desired reward of the corresponding high-density area by the following formula:

wherein,

In summary, according to the IPv6 active address detection apparatus based on reinforcement learning in the embodiment of the present application, a density space tree is used to find a high density region of a seed address, and then an reinforcement learning method is used to perform iterative detection on each generated high density region of the seed address. And updating the density distribution of the seed addresses according to the rewards of the active addresses in each iterative detection, so that the density distribution of the seed addresses moves to the actual active address distribution to correct the problem of inconsistency with the active address distribution caused by sampling deviation of the seed addresses, thereby determining a high-density area of the active addresses in a real network and carrying out address detection in the high-density area, thereby improving the efficiency of detecting the active addresses and saving detection resources.

In order to implement the foregoing embodiments, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements an reinforcement learning based IPv6 active address detection method according to the embodiment of the first aspect of the present application.

Although the present application has been disclosed in detail with reference to the accompanying drawings, it is to be understood that such description is merely illustrative and not restrictive of the application of the present application. The scope of the present application is defined by the appended claims and may include various modifications, adaptations, and equivalents of the invention without departing from the scope and spirit of the application.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An IPv6 active address detection method based on reinforcement learning is characterized by comprising the following steps:

2. The address detection method according to claim 1, wherein a plurality of high-density regions of the seed address are determined by a density space tree, wherein a root node of the density space tree represents an active address space, and a leaf node of the density space tree represents a high-density region of the seed address, and after step S4, the method further comprises:

and merging the leaf nodes of the density space tree to the corresponding parent nodes.

3. The address detection method of claim 2, wherein merging leaf nodes of the density space tree into corresponding parent nodes comprises: performing probe address merging, reward merging and space merging on leaf nodes, wherein the space merging is performed by the following formula:

4. The address detection method according to claim 1 or 2, wherein the preset number of target addresses is calculated by the following formula:

N(x_i)＝b*p(x_i)

wherein,

5. The address detection method of claim 1, updating the expected reward of the corresponding high-density area by the following formula:

wherein,

6. An apparatus for detecting IPv6 active address based on reinforcement learning, comprising:

7. The address detection apparatus of claim 6, wherein the obtaining module determines a plurality of high-density regions of the seed address through a density space tree, wherein a root node of the density space tree represents an active address space and leaf nodes of the density space tree represent high-density regions of the seed address, the address detection apparatus further comprising:

8. The address detection apparatus according to claim 7, wherein the merging module is further configured to perform a detection address merging, a reward merging, and a space merging on leaf nodes, wherein the merging module is specifically configured to perform the space merging according to the following formula:

9. The address detection apparatus of claim 7, wherein the detection module is further configured to calculate the preset number of target addresses by the following formula:

N(x_i)＝b*p(x_i)

wherein,

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the reinforcement learning based IPv6 active address probing method according to any one of claims 1-5.