CN108809726B

CN108809726B - Method and system for covering node by box

Info

Publication number: CN108809726B
Application number: CN201810619254.XA
Authority: CN
Inventors: 廖好; 吴兴桐; 周明洋; 陆克中; 毛睿; 吴向阳
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2018-06-15
Filing date: 2018-06-15
Publication date: 2021-10-15
Anticipated expiration: 2038-06-15
Also published as: CN108809726A

Abstract

The invention discloses a method and a system for covering nodes by boxes. And randomly generating a judgment probability value, if the judgment probability value is smaller than the execution probability value, setting a non-central node with the largest net capacity as a central node, marking a net box node of the central node, and if the judgment probability value is larger than or equal to the execution probability value, randomly setting one non-central node as the central node, and marking the net box node of the central node. And the central node and the marked net box node form a box, the number of the boxes is increased by 1, and the steps are repeatedly executed until no unmarked node exists. Because the central node in the MEMB algorithm is the non-central node with the largest net capacity and the central node in the RS algorithm is the randomly selected non-central node, the MEMB algorithm and the RS algorithm are combined by setting the execution probability value, the calculation complexity can be ensured to be relatively small, and the result precision can be ensured to be relatively high.

Description

Method and system for covering node by box

Technical Field

The present invention relates to the field of box coverage, and in particular, to a method and a system for box coverage of a node.

Background

The world of the complex network is boundless, and all the element action relationships in the natural society can be depicted by the complex network. Data partitioning and normalized transformation caused by data dimensions and breadth in complex networks, involves the problem of box coverage in quantum networks, statistical physics, and computer science: the network is partitioned with a minimum number of boxes so that centers of adjacent boxes are linked together by entanglement. The box coverage problem was first proposed by Song in the reference to compute the distribution of a given network with node size N, in a complex network concept a box refers to a subgraph with a diameter smaller than scale size B.

In the box coverage algorithm in the prior art, on one hand, along with the increase of the scale of the nodes, the corresponding calculation complexity of the algorithm also increases exponentially, and on the other hand, the accuracy of the result is low because the calculation is simpler.

Disclosure of Invention

The invention mainly aims to provide a method and a system for covering nodes by boxes, which can solve the technical problems of complex calculation or insufficient result precision of the existing box covering algorithm.

To achieve the above object, a first aspect of the present invention provides a method for covering a node by a box, wherein for a set of nodes in network data, the method comprises:

step 1, traversing marking information of nodes, if unmarked nodes exist, randomly generating a judgment probability value, and comparing whether the judgment probability value is smaller than an execution probability value, wherein the judgment probability value is between 0% and 100%;

step 2, if the judgment probability value is smaller than the execution probability value, setting a non-central node with the largest net capacity in the node set as a central node, and marking a net box node of the central node, wherein the net box node is an unmarked node which takes the central node as a center and is within a preset radius, and the net capacity of the node is the sum of the net box node of the node and the number of the node;

step 3, if the judgment probability value is larger than or equal to the execution probability value, one non-central node is set as a central node randomly, and a net box node of the central node is marked;

step 4, the central node and the marked net box node of the central node form a box, the number of the box is increased by 1, and the initial value of the number of the boxes is 0;

and 5, if the unmarked nodes do not exist, outputting the box number.

To achieve the above object, a second aspect of the present invention provides a system of a box overlay node, which is characterized in that, for a set of nodes in network data, the system includes:

the generation comparison module is used for traversing the marking information of the nodes, randomly generating a judgment probability value if the unmarked nodes exist, and comparing whether the judgment probability value is smaller than an execution probability value or not, wherein the judgment probability value is between 0% and 100%;

a first marking module, configured to set, if the determination probability value is smaller than the execution probability value, a non-central node with a largest net capacity in the node set as a central node, and mark a net box node of the central node, where the net box node is an unmarked node within a preset radius with the central node as a center, and a net capacity of a node is a sum of net box nodes of the node and the number of the node itself;

the second marking module is used for randomly setting one non-central node as a central node and marking a net box node of the central node if the judgment probability value is larger than or equal to the execution probability value;

the increasing module is used for enabling the central node and the marked net box node of the central node to form a box, the number of the boxes is increased by 1, and the initial value of the number of the boxes is 0;

and the output module is used for outputting the box number if the unmarked nodes do not exist.

The invention provides a method and a system for covering nodes by boxes. And randomly generating a judgment probability value, and comparing the judgment probability value with the execution probability value to determine the selection mode of the central node. If the probability value is judged to be smaller than the execution probability value, the central node is a non-central node with the largest net capacity, namely the maximum box-covered algorithm (MEMB) algorithm; if the probability value is judged to be larger than or equal to the execution probability value, the central node is a randomly selected non-central node, namely a random sequence box-converting algorithm (RS). By reasonably setting the execution probability value, the MEMB algorithm is combined with the RS algorithm, so that the calculation complexity is relatively low, and the result precision is relatively high.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart illustrating a method for covering a node by a box according to a first embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating a refinement step of step 2 in the first embodiment of the present invention;

FIG. 3 is a schematic flow chart illustrating a refinement step of step 22 in the first embodiment of the present invention;

FIG. 4 is a schematic flow chart illustrating a refinement step of step 23 in the first embodiment of the present invention;

FIG. 5 is a schematic diagram of a node array algorithm in a small network according to a first embodiment of the present invention;

FIG. 6 is a schematic diagram of a node vector algorithm in a small network according to a first embodiment of the present invention;

fig. 7 is a schematic structural diagram of a system for covering a node by a box according to a second embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a refinement module of the first labeling module B in the second embodiment of the present invention;

FIG. 9 is a line graph of the number of box covers at different scales in a large-scale network according to the method and system for node box cover of the present invention;

FIG. 10 is a graph illustrating time-consuming operations of a method and system for box-covering nodes in a large-scale network according to the present invention;

fig. 11 is a time-consuming line graph of a method and system for box-covering nodes according to the present invention, using a node array, a node vector, an original MEMB algorithm, and an original RS algorithm in different scale networks.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Due to the fact that the existing box coverage algorithm exists in the prior art, or the technical problems that calculation is complex or result accuracy is insufficient exist.

In order to solve the above technical problem, the present invention provides a method and a system for covering a node by a box. And randomly generating a judgment probability value, and comparing the judgment probability value with the execution probability value to determine the selection mode of the central node. If the probability value is judged to be smaller than the execution probability value, the central node is a non-central node with the largest net capacity, and the non-central node is the MEMB algorithm; and if the probability value is judged to be more than or equal to the execution probability value, the central node is a randomly selected non-central node, namely the RS algorithm. By setting the execution probability value, the MEMB algorithm and the RS algorithm are combined, so that the relatively low calculation complexity and the relatively high result precision can be ensured.

Fig. 1 is a flowchart illustrating a method for covering a node with a box according to a first embodiment of the present invention. The method comprises the following steps:

step 1, traversing marking information of nodes, if unmarked nodes exist, randomly generating a judgment probability value, comparing and judging whether the probability value is smaller than an execution probability value, and judging whether the probability value is between 0% and 100%;

step 2, if the probability value is judged to be smaller than the execution probability value, setting a non-central node with the largest net capacity in the node set as a central node, and marking a net box node of the central node, wherein the net box node is an unmarked node which takes the central node as the center and is within a preset radius, and the net capacity of the node is the sum of the net box nodes of the node and the number of the node;

please refer to fig. 2, which is a flowchart illustrating a refinement step of step 2 according to a first embodiment of the present invention. Specifically, the method comprises the following steps:

step 21, if the probability value is smaller than the execution probability value, acquiring the data capacity of the non-central node in the node set, and judging whether the data capacity of the non-central node is smaller than a preset storage threshold value;

step 22, if the data capacity of the non-central node is smaller than the storage threshold, determining the non-central node with the maximum net capacity by using the node array, setting the non-central node with the maximum net capacity as the central node, and marking the net box node of the central node;

and step 23, if the data capacity of the non-central node is greater than or equal to the storage threshold, determining the non-central node with the maximum net capacity by using the node vector, setting the non-central node with the maximum net capacity as the central node, and marking the net box node of the central node.

Please refer to fig. 3, which is a flowchart illustrating a refinement step of step 22 according to a first embodiment of the present invention. Specifically, the method comprises the following steps:

step 221, if the data capacity of the non-central node is smaller than the storage threshold, storing the non-central node and the net box node of the non-central node by using a node array;

step 222, sequentially calculating the net capacity of the non-central nodes, and determining the non-central node with the maximum net capacity;

step 223, set the non-central node with the largest net capacity as the central node, search the node array corresponding to the central node and mark the net box node in the node array.

Please refer to fig. 4, which is a flowchart illustrating a refinement step of step 23 according to a first embodiment of the present invention. Specifically, the method comprises the following steps:

231, if the data capacity of the non-central node is greater than or equal to the storage threshold, counting the net capacity of the non-central node by using the node vector, and determining the non-central node with the maximum net capacity;

step 232, setting the non-central node with the largest net capacity as a central node, and searching a net box node of the central node;

and 233, sequentially marking the net box nodes of the central node, and respectively taking the marked net box nodes as the center to reduce the net capacity in the node vector corresponding to the node within the preset radius by 1.

Step 3, if the probability value is judged to be larger than or equal to the execution probability value, a non-central node is set as a central node randomly, and a net box node of the central node is marked;

step 4, the central node and the net box node of the marked central node form a box, the number of the box is increased by 1, and the initial value of the number of the box is 0;

and 5, outputting the box number if the unmarked node does not exist.

In addition, in the first embodiment of the present invention, step 2 and step 3 are followed by further including:

and 6, if no net box node of the central node exists, canceling the setting of the central node, and returning to execute the step 1.

Before the method for covering the nodes by the box is executed, the node set in the network data needs to be read, and the nodes in the node set need to be initialized. And recording a node set as N, recording a node set of a central node as Center, recording a node set of a marked node (excluding the central node) as Covered, and recording a node set of an unmarked node as non-Center or Unneighboring. And then setting values of the execution probability value and the preset radius. Traversing the marking information of the nodes, if the unmarked nodes exist, randomly generating a judgment probability value, if the judgment probability value is smaller than an execution probability value, setting the non-central node with the maximum net capacity as a central node, namely executing an MEMB algorithm, and marking the net box nodes of the central node; and if the probability value is judged to be larger than or equal to the execution probability value, immediately setting a non-central node as a central node, namely executing an RS algorithm and marking a net box node of the central node. The net box nodes are unmarked nodes within a preset radius centered on the center node. Before the method is executed, nodes are set, and the nodes in the node set comprise a central node, marked nodes and unmarked nodes. Thus, non-central nodes include marked nodes and unmarked nodes. The net box nodes of the central node and the marked central node form a box, the box number is increased by 1, the method is repeatedly executed until no unmarked node exists, and the box number is output.

It should be noted that the time overhead of the pure MEMB algorithm is large in a small scale, and gradually tends to be stable as the network scale increases. Therefore, if the net capacity of the node can be retrieved faster, the running speed of the MEMB algorithm can be increased. Considering the storage size of the storage, under the condition that a storage threshold value is preset, two ways are selected to optimize the MEMB algorithm. 1. Storing all box nodes by using a node array, for example, storing a non-central node and a net box node of the non-central node by using a buffer array, so that the non-central node with the maximum net capacity and the net box node thereof can be rapidly determined; 2. the net capacity of each non-central node is recorded by using the node vector, so that the non-central node with the maximum net capacity can be quickly determined. The use of a node array to store all of the box nodes requires more memory than a node vector. In practical applications, the method of using only one optimization algorithm or combining two optimization algorithms can be selected. The following describes the node array algorithm and the node vector algorithm, respectively:

1. please refer to fig. 5, which is a schematic diagram of a node array algorithm in a small network according to a first embodiment of the present invention. Wherein the tilted numbers represent the center node and the marked net box node of the center node, and the bold numbers represent the non-center node with the largest current net capacity and the net box node of the center node. Initially, all nodes in the node set are not marked, so that all nodes are non-central nodes, a node array is used for storing each non-central node and a net box node of each non-central node, the net capacity of each non-central node is calculated, the non-central node with the maximum net capacity is determined, the non-central node with the maximum net capacity is set as the central node, the node array corresponding to the central node is searched, and the net box nodes in the node array are marked. If the MEMB algorithm is optimized by only using the node array, when the data capacity of the non-central node is larger than or equal to the storage threshold, the node array is used for storing the net box nodes of each non-central node and each non-central node within the storage threshold, the net capacity of each non-central node in the node set is calculated, the non-central node with the maximum net capacity is determined, the non-central node with the maximum net capacity is set as the central node, and whether the central node is stored in the node array or not is determined. If the central node is stored in the node array, searching the node array corresponding to the central node and marking a net box node in the node array; and if the central node is not stored in the node array, searching the net box nodes of the central node and sequentially marking the net box nodes of the central node.

2. Please refer to fig. 6, which is a schematic diagram of a node vector algorithm in a small network according to a first embodiment of the present invention. Wherein the bold numbers indicate the non-central node with the largest current net capacity and its net capacity. Initially, all nodes in the node set are not marked, i.e., all nodes in the node set are non-central nodes. The method comprises the steps of using a node vector such as NC to count non-central nodes and net capacity thereof, determining the non-central node with the largest net capacity in the node vector, setting the non-central node with the largest net capacity as the central node, searching the net box nodes of the central node, sequentially marking the net box nodes of the central node, and enabling the net capacity in the node vector corresponding to the node within a preset radius to be reduced by one by taking the net box node as a center when each net box node marked with one central node is marked. That is, taking node 3 in fig. 6 as an example, when node 1 is marked, the net capacities of node 1, node 2 and node 3 within the preset radius of node 1 are all decreased by 1; then when marking the node 2, the net capacities of the node 1, the node 2 and the node 3 within the preset radius of the node 2 are all reduced by 1; then when marking the node 3, the net capacities of the node 1, the node 2, the node 3 and the node 4 within the preset radius of the node 3 are all reduced by 1; when node 4 is then marked, the net capacity of

nodes

3, 4 and 5 within the preset radius of node 4 is reduced by 1. At this time, the marking of the clean box node of the central node 3 is completed, and the numerical relationship of each node in the node vector is changed from graph (a) to graph (b).

Further, after the non-central node with the largest net capacity or the randomly selected non-central node is set as the central node, the central node is used as the center, and no unmarked node exists within the preset radius, that is, no net box node of the central node exists, the setting of the central node is cancelled, and the step 1 is executed again.

In the embodiment of the invention, the MEMB algorithm and the RS algorithm are combined by setting the execution probability value, so that the calculation complexity is relatively low, and the result precision is relatively high. And judging through the data capacity and the storage threshold of the non-central node, so as to determine whether a node array or a node vector is adopted to determine the non-central node with the maximum net capacity. The memory space of the memory is fully utilized, the node array and the node vector are used, the retrieval of the net capacity of the node can be accelerated, and the running speed of the MEMB algorithm is improved. Through canceling the setting of the center node, the number of boxes can be effectively reduced, and the result precision is greatly improved.

Fig. 7 is a schematic structural diagram of a system for covering a node with a box according to a second embodiment of the present invention. Specifically, the method comprises the following steps:

the generation comparison module A is used for traversing the marking information of the nodes, randomly generating a judgment probability value if the unmarked nodes exist, comparing and judging whether the probability value is smaller than the execution probability value or not, and judging that the probability value is between 0% and 100%;

the first marking module B is used for setting a non-central node with the largest net capacity in the node set as a central node if the probability value is judged to be smaller than the execution probability value, and marking a net box node of the central node, wherein the net box node is an unmarked node which takes the central node as the center and is within a preset radius, and the net capacity of the node is the sum of the net box node of the node and the number of the node per se;

the second marking module C is used for randomly setting a non-central node as a central node and marking a net box node of the central node if the probability value is judged to be larger than or equal to the execution probability value;

the adding module D is used for forming a box by the central node and the marked net box node of the central node, the number of the boxes is increased by 1, and the initial value of the number of the boxes is 0;

and the output module E is used for outputting the box number if the unmarked nodes do not exist.

For the related description of the embodiment of the present invention, please refer to the related description of the first embodiment of the present invention, which is not repeated herein.

In the embodiment of the invention, the MEMB algorithm and the RS algorithm are combined by setting the execution probability value, so that the calculation complexity is relatively low, and the result precision is relatively high.

Fig. 8 is a schematic structural diagram of a refinement module of the first marking module B according to the second embodiment of the present invention. Specifically, the method comprises the following steps:

the first marking module B specifically includes the following modules:

the obtaining and judging module B1 is configured to, if it is judged that the probability value is smaller than the execution probability value, obtain the data capacity of the non-central node in the node set, and judge whether the data capacity of the non-central node is smaller than a preset storage threshold;

a first determining module B2, configured to, if the data capacity of a non-central node is smaller than the storage threshold, determine, by using the node array, the non-central node with the largest net capacity, set the non-central node with the largest net capacity as a central node, and mark a net box node of the central node;

a second determining module B3, configured to, if the data capacity of the non-central node is greater than or equal to the storage threshold, determine, using the node vector, the non-central node with the largest net capacity, set the non-central node with the largest net capacity as the central node, and mark a net box node of the central node.

The first determining module B2 specifically includes the following modules:

a storage module B21, configured to, if the data capacity of the non-central node is smaller than the storage threshold, store the non-central node and a net box node of the non-central node using the node array;

a calculation determination module B22, configured to sequentially calculate the net capacities of the non-central nodes, and determine the non-central node with the largest net capacity;

and setting a marking module B23, configured to set the non-central node with the largest net capacity as a central node, search a node array corresponding to the central node, and mark a net box node in the node array.

The second determination module B3 specifically includes the following modules:

a statistic determination module B31, configured to, if the data capacity of the non-central node is greater than or equal to the storage threshold, use the node vector to count the net capacity of the non-central node, and determine the non-central node with the largest net capacity;

a searching module B32 is arranged for setting the non-central node with the largest net capacity as the central node and searching the net box node of the central node;

and a marking and 1 subtracting module B33, which marks the net box nodes of the central node in sequence, and respectively uses the marked net box nodes as the center to reduce the net capacity in the node vector corresponding to the node within the preset radius by 1.

In the embodiment of the invention, the MEMB algorithm and the RS algorithm are combined by setting the execution probability value, so that the calculation complexity is relatively low, and the result precision is relatively high. And judging through the data capacity and the storage threshold of the non-central node, so as to determine whether a node array or a node vector is adopted to determine the non-central node with the maximum net capacity. The memory space of the memory is fully utilized, the node array and the node vector are used, the retrieval of the net capacity of the node can be accelerated, and the running speed of the MEMB algorithm is improved.

In addition, in the second embodiment of the present invention, the method further includes, after the first marking module B or the second marking module C:

and the cancellation returning module is used for canceling the setting of the central node if the net box node of the central node does not exist, and returning to the generation and comparison module A.

In the embodiment of the invention, if the net box node without the center node is available, the number of boxes can be effectively reduced by canceling the arrangement of the center node, so that the result precision is greatly improved.

In the invention, practical feasibility and effectiveness description is carried out respectively from the two aspects of box coverage number and algorithm time consumption. Specifically, the method comprises the following steps:

1. the algorithm inherits the characteristics of the MEMB algorithm, ensures that the nodes in the box are communicated, and does not need to be simulated for many times in the process of distributing the nodes to the box. The MEMB algorithm and the RS algorithm are doped together, and the algorithm is easy to realize. And according to the set execution probability value, the relation between the number of boxes obtained through the MEMB algorithm and the number of boxes obtained through the RS algorithm can be manually regulated and controlled, and the method has diversity according to different requirements.

The algorithm is divided into 11 types (the execution probability value is from 0.0, and is increased by 0.1 until 1.0), including a pure MEMB algorithm (the execution probability is 1.0) and a pure RS algorithm (the execution probability is 0.0). The data set standard is used to show the running condition of the algorithm.

(1) The cassette covers the number. Please refer to fig. 9, which is a graph illustrating box coverage numbers of different scales in a large-scale network according to the method and system for box coverage node of the present invention. Wherein the vertical axis represents the number of box coverage and the horizontal axis represents the scale. (a) And (b) represents the small world network, the results taking the single logarithm. (c) And (d) a fractal network is shown, and the result takes a double logarithm, wherein (c) the fractal network is obtained by adding a continuous edge with the ratio of c being 0.05 to the small-world network, and (d) the fractal network is a world wide web network.

As can be seen from fig. 9, the number of box coverage depends on the regulation of the execution probability. For the processing capacity of a large-scale network, even if the algorithm with the execution probability of 0.1 is adopted, the box coverage number is reduced by more than one time compared with the result of the RS algorithm, no matter at a small scale or at a medium-large scale. When the execution probability value is greater than or equal to 0.7, the difference between the result accuracy of the algorithm under various magnitudes and scales and the result proportion of a pure MEMB algorithm is not more than 0.1, and particularly, the result difference of the box coverage number finally stays below 1 along with the increase of the scales. In a fractal network, the algorithm has a jitter condition under a large scale with an execution probability value smaller than 0.4, because the RS algorithm has a too high proportion, a certain negative effect can be generated under a small number of box coverage results under a large scale. But such a result still leads to a combination of RS algorithm and MEMB algorithm, which results in a controlled randomness. In addition to this, the number of box coverages gives excellent results in all large scale networks.

(2) Time consumption. Referring to fig. 10, a graph illustrating the time consumption of the operation of the method and system for covering nodes in a large-scale network according to the present invention is shown. The vertical axis represents the time consumed by the algorithm to run, and the horizontal axis represents the scale. (a) And (b) represents the small world network, the results taking the single logarithm. (c) And (d) a fractal network is shown, and the result takes a double logarithm, wherein (c) the fractal network is obtained by adding a continuous edge with the ratio of c being 0.05 to the small-world network, and (d) the fractal network is a world wide web network.

As can be seen from fig. 10, when the network size is large enough, the time consumption of the algorithm will exhibit an approximately power-degree decreasing trend. When the scale is extremely low (about 3), the time consumption of the algorithm is extremely close to that of the RS algorithm, and the difference ratio of the RS algorithm is far smaller than that of the MEMB algorithm. Even in the case of extremely large scales, the time consumption worst case of the algorithm basically does not exceed the time consumption of the MEMB algorithm, and the final worst result only fluctuates around the time consumption of the result of the MEMB algorithm. Under the relatively small network scale, the time consumption of the MEMB algorithm is extremely unstable under different scales, but the RS algorithm can obtain a relatively stable result curve, the algorithm inherits the pursuit of the RS algorithm on time efficiency, and the time consumption which is considerable and stable in quantification can be obtained through actual measurement. These are sufficient to demonstrate the better controllability and utilization of the RS algorithm in the present algorithm, and the excellent improvement of the high time-consuming nature of the MEMB algorithm.

2. Referring to fig. 11, it is a time-consuming line graph of a box coverage node method and system according to the present invention, using a node array, a node vector, an original MEMB algorithm and an original RS algorithm in different scale networks. The vertical axis represents the time consumed by the algorithm to run, and the horizontal axis represents the scale. (a) And (b) represents the small world network, the results taking the single logarithm. (c) And (d) a fractal network is shown, and the result takes a double logarithm, wherein (c) the fractal network is obtained by adding a continuous edge with the ratio of c being 0.05 to the small-world network, and (d) the fractal network is a world wide web network.

As can be seen from fig. 11, the MEMB algorithm using a node array and the MEMB algorithm using a node vector are much less complex than the pure MEMB algorithm in a small scale under a network of various magnitudes, and the complexity approaches the pure RS algorithm when the scale is about 3. Especially, the MEMB algorithm using the node vectors can be obtained in the medium-large scale (the scale is larger than 13) of the network with more than one hundred thousand orders, and the time consumption is comparable to that of a pure MEMB algorithm. Therefore, the performance that both the MEMB algorithm using the node array and the MEMB algorithm using the node vector can obtain the advantages compared with the pure MEMB algorithm under the small scale and the large scale can be obtained; at the mesoscale, the time consumption of the MEMB algorithm using the node array and the MEMB algorithm using the node vector fluctuates around the pure MEMB algorithm, which indicates that when the box can cover a large number of nodes, the influence of the added node array and node vector on the net content update of the uncovered nodes in the MEMB algorithm is basically offset, and no negative influence is generated. Therefore, the MEMB algorithm using the node array and the MEMB algorithm using the node vector are greatly advantageous under the small-scale box coverage of a large-scale network in practice.

Secondly, the invention has great advantages at least in the scientific research and electronic commerce fields. Specifically, the method comprises the following steps:

1. in the scientific research field, the algorithm can greatly improve the actual operation speed of the algorithm at the cost of reducing lower precision under the condition of scientific research needs. The algorithm can be applied to the field of complex networks, and can also be applied to the evolution of proteins in biological protein interaction networks. The protein interaction network is an ultra-large-scale small-world network, and the box coverage processing is carried out on the protein interaction network, so that the research on the network can be greatly promoted. But only with the pure MEMB algorithm, a terrorist time consumption will occur. On the premise of acquiring an approximate optimal solution based on the greedy algorithm, the evolution result of the designated protein can be acquired relatively quickly by losing the result with lower precision. The MEMB algorithm using the node array and the MEMB algorithm using the node vector are suitable for being applied to a large-scale complex network to carry out small-scale box coverage processing, the high efficiency similar to the stored RS algorithm can be obtained, and the high precision of the pure MEMB algorithm is not lost. The application is a programmed implementation and is used as a core step.

2. The electronic commerce field, such as large electronic shopping malls like Taobao, Jingdong, Amazon and the like, can be applied to the necessary technologies of fast iterative computation of commodity user preference, coarse-grained sorting statistics of bulk commodity categories and the like. The application mode is programming implementation and is applied in the data preprocessing process after the data is imported.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., multiple modules may be combined or may be integrated into another system, etc. While, for purposes of simplicity of explanation, the methodologies of the present invention are shown and described as a series of acts, it is to be understood and appreciated by those skilled in the art that the present invention is not limited by the order of acts, as some steps may, in accordance with the present invention, occur in other orders and/or concurrently. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no acts or modules are necessarily required of the invention. In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The above description is provided for the method and system for box-covering a node, and those skilled in the art will be able to change the idea of the embodiment of the present invention in the following detailed description and application scope, and in summary, the content of the present specification should not be construed as limiting the present invention.

Claims

1. A method of box overlay nodes, for a set of nodes in network data, the method comprising:

step 1, traversing marking information of nodes, if unmarked nodes exist, randomly generating a judgment probability value, and comparing whether the judgment probability value is smaller than a preset execution probability value, wherein the judgment probability value is between 0% and 100%;

and 5, returning to the step 1 to judge again and randomly generate a judgment probability value, and repeatedly executing the steps 1-4 until no unmarked node exists, and outputting the box number.

2. The method according to claim 1, characterized in that said step 2 comprises in particular the steps of:

step 21, if the judgment probability value is smaller than the execution probability value, acquiring the data capacity of a non-central node in the node set, and judging whether the data capacity of the non-central node is smaller than a preset storage threshold value;

step 22, if the data capacity of the non-central node is smaller than the storage threshold, determining the non-central node with the largest net capacity by using a node array, setting the non-central node with the largest net capacity as a central node, and marking a net box node of the central node;

3. The method according to claim 2, characterized in that said step 22 comprises in particular the steps of:

step 221, if the data capacity of the non-central node is smaller than the storage threshold, storing the non-central node and a net box node of the non-central node by using a node array;

step 223, setting the non-central node with the largest net capacity as a central node, searching a node array corresponding to the central node, and marking a net box node in the node array.

4. The method according to claim 2, wherein the step 23 comprises the following steps:

step 231, if the data capacity of the non-central node is greater than or equal to the storage threshold, using the node vector to count the net capacity of the non-central node, and determining the non-central node with the largest net capacity;

step 232, setting the non-central node with the maximum net capacity as a central node, and searching a net box node of the central node;

5. The method according to any one of claims 1 to 4, wherein the step 2 or the step 3 is further followed by:

6. A system of box-overlay nodes, the system comprising, for a set of nodes in network data:

the generation comparison module is used for traversing the marking information of the nodes, randomly generating a judgment probability value if the unmarked nodes exist, and comparing whether the judgment probability value is smaller than a preset execution probability value or not, wherein the judgment probability value is between 0% and 100%;

and the output module is used for returning to the generation comparison module to judge again and randomly generate a judgment probability value, and repeatedly returning to the generation comparison module, the first marking module, the second marking module and the increasing module until no unmarked node exists, and outputting the box number.

7. The system according to claim 6, characterized in that said first marking module comprises in particular the following modules:

the obtaining and judging module is used for obtaining the data capacity of the non-central node in the node set if the judging probability value is smaller than the executing probability value, and judging whether the data capacity of the non-central node is smaller than a preset storage threshold value or not;

a first determining module, configured to determine, if the data capacity of the non-central node is smaller than the storage threshold, a non-central node with a largest net capacity by using a node array, set the non-central node with the largest net capacity as a central node, and mark a net box node of the central node;

and a second determining module, configured to determine, if the data capacity of the non-central node is greater than or equal to the storage threshold, a non-central node with the largest net capacity by using a node vector, set the non-central node with the largest net capacity as a central node, and mark a net box node of the central node.

8. The system according to claim 7, wherein the first determining module specifically comprises the following modules:

a storage module, configured to store a non-central node and a net box node of the non-central node using a node array if the data capacity of the non-central node is smaller than the storage threshold;

the calculation determining module is used for sequentially calculating the net capacity of the non-central nodes and determining the non-central node with the maximum net capacity;

and the setting and marking module is used for setting the non-central node with the maximum net capacity as a central node, searching a node array corresponding to the central node and marking the net box nodes in the node array.

9. The system according to claim 7, wherein the second determining module specifically comprises the following modules:

a statistic determination module, configured to, if the data capacity of the non-central node is greater than or equal to the storage threshold, use a node vector to count a net capacity of the non-central node, and determine a non-central node with a largest net capacity;

the setting and searching module is used for setting the non-central node with the maximum net capacity as a central node and searching a net box node of the central node;

and the marking and 1 subtracting module is used for sequentially marking the net box nodes of the central node, and respectively taking the marked net box nodes as the center to reduce the net capacity in the node vector corresponding to the node within the preset radius by 1.

10. The system of any one of claims 6 to 9, wherein the first or second marking modules are followed by further comprising:

and the return canceling module is used for canceling the setting of the central node and returning to the generation and comparison module if the net box node of the central node does not exist.