CN114168971A - Internet of things coverage vulnerability repairing method based on reinforcement learning - Google Patents

Internet of things coverage vulnerability repairing method based on reinforcement learning Download PDF

Info

Publication number
CN114168971A
CN114168971A CN202111502028.1A CN202111502028A CN114168971A CN 114168971 A CN114168971 A CN 114168971A CN 202111502028 A CN202111502028 A CN 202111502028A CN 114168971 A CN114168971 A CN 114168971A
Authority
CN
China
Prior art keywords
node
coverage
vulnerability
repairing
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111502028.1A
Other languages
Chinese (zh)
Inventor
邓贤君
夏云芝
易灵芝
杨天若
朱晨露
杨静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202111502028.1A priority Critical patent/CN114168971A/en
Publication of CN114168971A publication Critical patent/CN114168971A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention discloses a reinforcement learning-based method for repairing the coverage loophole of the Internet of things, which comprises the following steps: (1) establishing a network model according to the monitoring coverage requirement of a target area; (2) establishing a network coverage model through the credible information coverage model, and calculating the coverage rate; (3) determining directional repair nodes of the vulnerability area by adopting a vulnerability reconstruction point and movable node minimum shortest coordinate distance bidirectional selection method; (4) and training the directional repair node M-node by adopting a Q-Learning method, and repairing the vulnerability subgrid until the coverage rate meets the requirement or the iteration number reaches a set upper limit. The method disclosed by the invention has the advantages that the overall coverage rate is high after the repair, the spatial correlation of the coverage target area monitoring reconstruction point is comprehensively mined from the information cooperation angle, the movement of the directional repair node is guided by utilizing a reward mechanism based on the reinforcement learning method, the coverage hole repair is completed, the energy consumption and the repair time are saved, and the coverage rate is improved.

Description

Internet of things coverage vulnerability repairing method based on reinforcement learning
Technical Field
The invention belongs to the technical field of Internet of things, and particularly relates to a reinforcement learning-based Internet of things coverage vulnerability repairing method.
Background
The coverage of the Internet of things is the core technical problem of the development of the Internet of things, and the Internet of things can meet the basic requirement of accurately collecting monitoring target information in real time. The coverage hole caused by energy consumption, environmental factors, software defects and other factors can greatly reduce the coverage rate of the internet of things and influence the safety and reliability of the network. The task of the Internet of things coverage hole repair is to quickly identify a coverage hole which appears suddenly and repair the coverage hole at the fastest speed with the lowest energy consumption as possible so as to meet the coverage requirement. Therefore, it is important to repair the networking coverage hole effectively in real time.
The difficulty of the coverage vulnerability repair of the Internet of things is mainly reflected in three aspects. Firstly, the network coverage model has coverage portrayal and definition difference for different actual scenes, and the selection of the model directly influences the applicability of the repairing method and the actual application scene; secondly, selecting a proper sensor repairing node for the vulnerability, so that the sensor repairing node can reach a repairing position in the shortest time to guarantee repairing time effectiveness and reduce repairing energy consumption; and thirdly, when the sensor node performs leak repairing, an effective motion direction and an effective motion path need to be selected, and unnecessary motion tracks are avoided, so that the energy of the sensor node is saved.
Disclosure of Invention
Aiming at the defects or improvement requirements in the prior art, the invention provides a reinforcement learning-based method for repairing the coverage loophole of the Internet of things, which aims to quickly and effectively repair the coverage loophole in the network of the Internet of things. According to the vulnerability repair method, the coverage range of the sensor is defined through the credible information coverage model, the spatial correlation of vulnerability reconstruction points is fully exerted, the directional repair nodes of a vulnerability area are determined by adopting a minimum shortest coordinate distance bidirectional selection method between the vulnerability reconstruction points and movable nodes, the movement of the directional repair nodes is guided by utilizing a reward mechanism based on a reinforcement learning method, the energy consumption and the repair time are saved, and therefore the technical problem of the Internet of things coverage vulnerability repair is solved.
In order to achieve the above object, according to an aspect of the present invention, there is provided a reinforcement learning-based method for repairing a vulnerability covered by an internet of things, including the following steps:
(1) establishing a network model according to the monitoring coverage requirement of a target area;
(1.1) setting a variable range and an estimated root mean square error threshold according to the spatial correlation of a detection target, and performing area sub-grid division on a target coverage area according to the variable range;
(1.2) recording the number i of the sensor nodes according to the distribution of the sensor nodes, and taking the center point of each sub-grid as a reconstruction point and expressing the center point as p according to the divided coverage sub-grids;
(2) establishing a network coverage model through the credible information coverage model, and calculating the coverage rate;
(3) determining directional repair nodes of the vulnerability area by adopting a vulnerability reconstruction point and movable node minimum shortest coordinate distance bidirectional selection method;
(4) and training the directional repair node M-node by adopting a Q-Learning method, and repairing the vulnerability subgrid until the coverage rate meets the requirement or the iteration number reaches a set upper limit.
In an embodiment of the present invention, the step (2) specifically includes the following sub-steps:
(2.1) in the credible information coverage model, for the space points which are not sampled, calculating the estimated value of the environment variable of the reconstruction point by adopting a common Krigin interpolation function, namely adopting the sensor node tau in the reconstruction neighborhood Z (p)iComputing an environment variable estimate by a weighted average of the measured values of (a); interpolation weight coefficient lambda of sensor node in neighborhoodiSatisfy the requirement of
Figure BDA0003402068020000021
n is the sensor node tau in the reconstruction neighborhood Z (p)iThe number of (2);
(2.2) calculating the root mean square error phi (p) of the reconstruction point p by combining a common kriging interpolation function, wherein the calculation expression is as follows:
Figure BDA0003402068020000022
wherein
Figure BDA0003402068020000023
And μ (p) is solved by steps (2.1.1) and (2.1.2);
(2.3) according to the definition of the credible information coverage model, if phi (p) > epsilon0If the root mean square error is larger than the set coverage threshold, the sub-grid is covered, otherwise, the sub-grid is not covered; recording the number j' of the covered reconstruction point, the error of the root mean square is less than the threshold value epsilon0The vulnerability reconstruction point number of j ";
and (2.4) calculating the coverage rate of the target area.
In one embodiment of the present invention, the calculation formula of the coverage rate in the step (2.4) is as follows:
Figure BDA0003402068020000031
wherein S is the total area of the covered area, j' is the number of the covered reconstruction point, Sj'The area of the sub-grid where the covered reconstruction point j' is located is m, which is the total number of covered reconstruction points.
In one embodiment of the invention, said step (2.1) comprises the following sub-steps:
(2.1.1) interpolation weight coefficient lambdaiObtaining a group of optimal solutions through the minimum kriging variance; introducing a Lagrange multiplier mu (p) to generate a linear Krigin system consisting of n +1 equation sets with n +1 unknowns, and solving to obtain an interpolation weight coefficient lambdai
Figure BDA0003402068020000032
Wherein, γ (τ)ij) And γ (τ)iP) is calculated by a variation function;
(2.1.2) calculating γ (. tau.) in step (2.1.1)ij) And γ (τ)i,p);Selecting a Gaussian variation function as a variation function of the environment variable for describing the sensor node tauiCollecting spatial correlation between data; the gaussian variation function is formulated as:
Figure BDA0003402068020000041
Figure BDA0003402068020000042
wherein d isτpFor the sensor node τiAnd the euclidean distance of the reconstruction point p,
Figure BDA0003402068020000043
for the sensor node τiAnd τjEuclidean distance of C0And C is a constant.
In an embodiment of the present invention, the step (3) specifically includes the following sub-steps:
(3.1) selecting a fixed node and a movable node from the undamaged sensor nodes; selecting a fixed node for the covered sub-grid, wherein the fixed node does not move any more so as to ensure the coverage of the covered sub-grid; calculating Euclidean distances between the sensor nodes in the range of the reconstruction point of each covered sub-grid and the reconstruction point, selecting the sensor node with the minimum Euclidean distance from the covered reconstruction point as a fixed node of the covered grid and recording the sensor node as an F-node, and recording other sensor nodes as movable nodes and recording the movable nodes as R-nodes;
(3.2) determining a directional repair node for each vulnerability reconstruction point;
in an embodiment of the present invention, the step (3.2) specifically includes the following sub-steps:
(3.2.1) calculating the shortest coordinate distance from the R-node to the vulnerability reconstruction point j'; selecting an intention repairing node from the R-node for each vulnerability reconstruction point j' according to the shortest coordinate distance;
(3.2.2) if one R-node is selected as an intention repairing node only by one vulnerability reestablishing point, establishing bidirectional selection, wherein the intention repairing node is a directional repairing node corresponding to the vulnerability reestablishing point; if the same R-node is selected as an intention repairing node by a plurality of vulnerability reconstruction points, the intention repairing node takes the vulnerability reconstruction point with the minimum shortest coordinate distance as a target repairing reconstruction point, and the intention repairing node is a directional repairing node of the selected target repairing reconstruction point; recording a directional repair node as an M-node;
(3.2.3) deleting the directional repairing node selected in the step (3.2.2) from the R-nodes space of the selectable repairing node, and reselecting an intention repairing node or a directional repairing node in the next round by other vulnerability reconstructing points which do not establish bidirectional selection;
(3.2.4) recording the shortest coordinate distance between the directional repair node and the target repair reconstruction point of the directional repair node as an initial node profit value of the directional repair node;
(3.2.5) repeating (3.2.1), (3.2.2), (3.2.3) and (3.2.4) until all vulnerability reconstruction points have selected corresponding directional repair nodes.
In one embodiment of the present invention, the step (4) comprises the following sub-steps:
(4.1) initializing Q-Learning training model parameters; setting learning probability alpha and exploration probability epsilon, wherein alpha and epsilon are constants between [0,1 ]; setting the coverage rate requirement and the training times t of a target area, establishing a Q table, and initializing the initial Q value of each sensor node to be 0; the initial restoration energy consumption value is 0, and the node initial utility function value is the initial node income value recorded in the step (3.2.4);
(4.2) selecting an action strategy for the M-node, and updating the state of the sensor node;
(4.3) learning; and updating the Q value of the state position corresponding to the sensor node, wherein the expression is as follows:
Q(s,ai,a-i)←(1-α)Q(s,ai,a-i)+α[R(s,ai)+γπ(s')]
wherein, Q (s, a)i,a-i) For the sensor node τiSelecting policy a in state siQ value of (1);
Figure BDA0003402068020000051
s 'is the next state of the sensor node, Num (s', a)-i) Selecting a divide strategy a for next state neighbor nodesiNumber of other strategies than; n (s ') is the number of adjacent nodes with undamaged next state, Q (s', a)i,a-i) For the sensor node τiSelecting policy a under state siQ value of (1);
(4.4) repeating; and (4) repeating the steps (2), (3), (4.2) and (4.3) until the coverage rate meets the requirement or the iteration number reaches the set upper limit.
In an embodiment of the present invention, the step (4.2) specifically includes the following sub-steps:
(4.2.1) exploring; generating a random number, and randomly selecting one of the four strategies of moving upwards, downwards, leftwards and rightwards with a probability epsilon as ai
(4.2.2) utilization; selecting a strategy a by considering the combined action of the Q value of the M-node and other sensor node strategies according to the probability 1-epsiloniSatisfy the following requirements
Figure BDA0003402068020000061
Wherein s represents the current state of the sensor node, a-iRepresenting a divide strategy a in a strategy spaceiStrategy other than Num (s, a)-i) Selecting a divide strategy a for adjacent nodesiThe times of other strategies except for n(s) are the number of undamaged adjacent nodes;
(4.2.3) calculating a utility function R; utility function value by node profit value P (a)i,a-i) And the repair energy consumption value C (a)i) Jointly determining; the expression is as follows:
R(ai,a-i)=μαP(ai,a-i)-μβC(ai)
wherein R (a)i,a-i) Representing a sensor node selection policy aiUtility value of time, P (a)i,a-i) Selecting policy a for sensor nodeiTime nodeProfit value, C (a)i) Selecting policy a for sensor nodeiTimely repair energy consumption value; mu.sαAnd muβBalancing the node profit value and the weight coefficient of the energy consumption for restoration; the node profit value is determined by the shortest coordinate distance from the sensor node to the target vulnerability repair point, the repair energy consumption value is determined by the moving distance of the sensor node, and the expression is as follows:
Figure BDA0003402068020000062
Figure BDA0003402068020000063
wherein, omega is a constant and is the order of magnitude relation of the balanced moving distance and the energy consumption;
Figure BDA0003402068020000064
the shortest coordinate distance from the sensor node to the target restoration reconstruction point is obtained; c is a constant to avoid unnecessary frequent back and forth movement of the sensor node, Δ diIs the distance of movement;
(4.2.4) node state protection; the state information of the sensor node comprises position information, a Q value, a strategy value and a utility function value of the node; when the sensor node executes the strategy, the utility function value is smaller than the utility reward value of the previous state, namely Ri(t)<Ri(t-1), not executing the current action strategy and still keeping the current state s; otherwise, the current strategy is executed, and the sensor node enters the next state s'.
Generally, compared with the prior art, the technical scheme of the invention has the following beneficial effects:
(1) the method has the advantages that the overall coverage rate is high after the restoration, the spatial correlation of the coverage target area monitoring reconstruction points is comprehensively mined from the information cooperation angle, the coverage error is estimated by using the root mean square error, the coverage prediction is completed, and the coverage rate is improved;
(2) the Q-Learning method under reinforcement Learning is used for rewarding and drawing by taking the distance between the sensor node and the vulnerability reconstruction point as a reward, so that the sensor node can be quickly guided to move, unnecessary movement tracks are reduced, and the repair time is saved;
(3) the method has low energy consumption, and the bidirectional selection method adopted by the invention can select the directional repair nodes for the vulnerability sub-grid, thereby avoiding the movement and energy consumption of unnecessary sensor nodes. Meanwhile, the Q-Learning method for reinforcement Learning guides the directional repair node to move in the shortest distance to repair the bug, so that repair energy consumption is saved;
(4) the internet of things is a universal network and is suitable for various application scenes. The credible information coverage model adopted in the invention can be used for covering different terrains, regions and different data monitoring targets.
Drawings
FIG. 1 is a flowchart of a coverage hole repairing method based on reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a schematic representation of utility values of sensor nodes in an embodiment of the present invention;
fig. 3 is a visualization result example of bug fixes in the embodiment of the present invention, where:
FIG. 3(a) is a schematic diagram of initial time overlay;
fig. 3(b) is a schematic diagram of the first loophole, where the training time t is 1;
fig. 3(c) is a diagram illustrating the training times t-99;
fig. 3(d) is a schematic diagram of the second loophole, where the training time t is 100;
fig. 3(e) is a diagram of the training times t 199;
fig. 3(f) is a schematic diagram of the third loophole, where the training time t is 200;
fig. 3(g) is a diagram of training time t-299;
fig. 3(h) is a schematic diagram of the fourth loophole, where the training time t is 300;
fig. 3(i) is a diagram of training time t-500;
in all the figures, the same reference numerals are usedTo denote the same element or structure, wherein: f-node represents a fixed node, R-node represents a mobile node, M-node represents a directional repair node, Rpj(Rxj,Ryj) Representing a vulnerability grid reconstruction point to be repaired, CR representing a variable range, Num representing the number of normal sensor nodes in the current state network, t representing the training frequency, and Per representing the coverage rate of a target area under current training.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The technical terms of the present invention are explained and explained first:
variation Range (CR): a distance threshold characterizing the spatial correlation of the environmental variable. For a particular environment variable and spatial point, only the values of other spatial points within the range of the variable are relevant to the current spatial point.
Root Mean Square Error (RMSE): the reconstruction and estimation quality, i.e. the measure of the error between the estimated values and the reference point values, is measured and evaluated for values of the unused spatial environment variables.
Trusted Information overlay (trusted Information overlay): in a target monitoring area, if the root mean square error of reconstruction information on a space point in the area is less than or equal to a threshold epsilon set forth by practical application requirements0Then the spatial point is covered by trusted information.
Euclidean distance: the absolute distance between two points or vectors in a multidimensional space, i.e. the square root of the difference between the vectors, is measured. Point A (a)x,ay) To point B (B)x,by) Has an Euclidean distance of
Figure BDA0003402068020000081
Shortest coordinate distance: the sensor node selectively moves 1 unit distance in four directions of up, down, left and right in each movement, and the point A (a)x,ay) To point B (B)x,by) Has a shortest coordinate distance of | ax-bx|+|ay-by|。
Kriging interpolation: the kriging method is essentially a moving weighted average method and has the characteristics of being optimal, linear, unbiased and the like. Kriging (Kriging) is a regression algorithm that spatially models and predicts (interpolates) random processes or random fields according to a covariance function. The kriging method can give an optimal linear unbiased estimate in a specific stochastic process, such as an inherently stationary process, and is therefore also referred to as a spatially optimal unbiased estimator in geostatistical.
Q-Learning: a model-independent reinforcement learning algorithm takes Markov Decision Processes (MDPs) as a theoretical basis, and continuously improves and finally obtains an optimal behavior strategy through interaction with the environment.
Utility function: q-learning is a function for measuring the return after strategy selection and is used for evaluating the quality of certain action taken under a specific state.
And (4) adjacent nodes: and the sensor node tauiIn its communication range RcOther sensor nodes in the array are tauiOf the neighboring node.
Q value: the value used in Q-learning to evaluate an action, called Q-value, represents the expectation of the agent selecting the action until the final state reward sum.
argmaxf (x) function: solving for the value of x such that f (x) takes the maximum value.
The solution to the difficulties existing in the prior art is as follows:
aiming at the difficulty one, most of the existing methods adopt a disc model to define the coverage range of the sensor node, and the model is too ideal and simple. The information collaborative reconstruction can be carried out from the dimensionality of the airspace by adopting a credible information coverage model, and the coverage degree of the reconstruction point is predicted by utilizing the correlation of the spatial point. The credible information coverage model makes full use of spatial correlation and can be well applied to practice. For the second difficulty, most of the existing methods select a repair node based on the shortest path, and the nodes without considering directivity carry out vulnerability repair. These methods are generally high in energy consumption due to global or local movement of network nodes. The problem can be well improved by adopting the loophole based on the shortest path, namely a movable node bidirectional selection method, and the movement energy consumption of unnecessary repair nodes is greatly reduced. Aiming at the third difficulty, the existing Voronoi diagram method is commonly used for space geometric division and path selection, and the topology control method adjusts the network coverage by adjusting the position and the sensing radius of the sensor node. However, these methods have problems of excessive energy consumption or a large number of repetitions. The reinforcement learning method can guide the movement of the sensor nodes through a reward mechanism, and the utilization rate of energy and the repair speed are improved.
As shown in fig. 1, the reinforcement learning-based internet of things coverage vulnerability repair method of the present invention includes the following steps:
(1) and establishing a network model according to the monitoring coverage requirement of the target area.
(1.1) setting a variable range and an estimated root mean square error threshold according to the spatial correlation of a detection target, and performing area sub-grid division on a target coverage area according to the variable range;
and (1.2) recording the number i of the sensor nodes according to the distribution of the sensor nodes, and taking the center point of each sub-grid as a reconstruction point according to the divided coverage sub-grids, wherein the center point of each sub-grid is denoted as p.
(2) And establishing a network coverage model through the credible information coverage model, and calculating the coverage rate. The method specifically comprises the following substeps:
(2.1) in the credible information coverage model, for the space points which are not sampled, calculating the estimated value of the environment variable of the reconstruction point by adopting a common Krigin interpolation function, namely adopting the sensor node tau in the reconstruction neighborhood Z (p)iTo calculate an environment variable estimate. Interpolation weight coefficient lambda of sensor node in neighborhoodiSatisfy the requirement of
Figure BDA0003402068020000101
n is the sensor node tau in the reconstruction neighborhood Z (p)iThe number of the cells. Wherein λiThe calculation of (b) comprises the following sub-steps:
(2.1.1) interpolation weight coefficient lambdaiA set of optimal solutions can be obtained with a minimum kriging variance. Introducing a Lagrange multiplier mu (p) to generate a linear Krigin system consisting of n +1 equation sets with n +1 unknowns, and solving to obtain an interpolation weight coefficient lambdai
Figure BDA0003402068020000111
Wherein, γ (τ)ij) And γ (τ)iP) can be calculated by a variogram.
(2.1.2) calculating γ (. tau.) in step (2.1.1)ij) And γ (τ)iP). Selecting a Gaussian variation function as a variation function of the environment variable for describing the sensor node tauiSpatial correlation between the acquired data. The general formula for the gaussian variation function is:
Figure BDA0003402068020000112
Figure BDA0003402068020000113
wherein d isτpFor the sensor node τiAnd the Euclidean distance, d, of the reconstruction point pτiτjFor the sensor node τiAnd τjEuclidean distance of C0And C is a constant when C0When C is 0 and 1, it is a standard gaussian function.
(2.2) calculating the root mean square error phi (p) of the reconstruction point p by combining a common kriging interpolation function, wherein the calculation expression is as follows:
Figure BDA0003402068020000114
wherein
Figure BDA0003402068020000115
And μ (p) are solved by steps (2.1.1) and (2.1.2).
(2.3) according to the definition of the credible information coverage model, if phi (p) > epsilon0I.e. the root mean square error is larger than the set coverage threshold, the subgrid is covered, otherwise it is not covered. Recording the number j' of the covered reconstruction point, the error of the root mean square is less than the threshold value epsilon0The vulnerability reconstruction point number of (1) is j ".
(2.4) calculating the coverage rate of the target area, wherein the calculation formula is as follows:
Figure BDA0003402068020000121
wherein S is the total area of the covered area, j' is the number of the covered reconstruction point, Sj'The area of the sub-grid where the covered reconstruction point j' is located is m, which is the total number of covered reconstruction points.
(3) And determining directional repair nodes of the vulnerability area by adopting a vulnerability reconstruction point and movable node minimum shortest coordinate distance bidirectional selection method. The method specifically comprises the following substeps:
and (3.1) selecting a fixed node and a movable node from the undamaged sensor nodes. And selecting a fixed node for the covered sub-grid, wherein the fixed node does not move any more so as to ensure the coverage of the covered sub-grid. And calculating Euclidean distances between the sensor nodes in the range of the reconstruction point of each covered sub-grid and the reconstruction point, selecting the sensor node with the minimum Euclidean distance from the covered reconstruction point as a fixed node of the covered grid and recording the fixed node as an F-node, and recording other sensor nodes as movable nodes as R-nodes.
And (3.2) determining a directional repairing node for each vulnerability reconstruction point. The method specifically comprises the following substeps:
and (3.2.1) calculating the shortest coordinate distance from the R-node to the vulnerability reconstruction point j'. And selecting an intention repairing node from the R-node for each vulnerability reconstruction point j' according to the shortest coordinate distance.
(3.2.2) if one R-node is selected as an intention repairing node only by one vulnerability reconstructing point, establishing bidirectional selection, wherein the intention repairing node is the directional repairing node corresponding to the vulnerability reconstructing point. If the same R-node is selected as an intention repairing node by a plurality of vulnerability reconstruction points, the intention repairing node selects the vulnerability reconstruction point with the minimum shortest coordinate distance from the intention repairing node as a target repairing reconstruction point, and the intention repairing node is a directional repairing node of the selected target repairing reconstruction point. And recording the directional repair node as an M-node.
And (3.2.3) deleting the directional repairing node selected in the step (3.2.2) from the R-nodes space of the selectable repairing node, and reselecting an intention repairing node or a directional repairing node in the next round by other vulnerability reconstructing points which do not establish bidirectional selection.
And (3.2.4) recording the shortest coordinate distance between the directional repair node and the target repair reconstruction point thereof as an initial node benefit value of the directional repair node.
(3.2.5) repeating (3.2.1), (3.2.2), (3.2.3) and (3.2.4) until all vulnerability reconstruction points have selected corresponding directional repair nodes.
(4) And training the directional repair node M-node by adopting a Q-Learning method, and repairing the vulnerability subgrid until the coverage rate meets the requirement or the iteration number reaches a set upper limit. The method comprises the following substeps:
(4.1) initializing Q-Learning training model parameters. And setting a learning probability alpha and an exploration probability epsilon, wherein alpha and epsilon are constants between [0 and 1 ]. Setting the coverage rate requirement and the training times t of the target area, establishing a Q table, and initializing the initial Q value of each sensor node to be 0. And (4) since the initial restoration energy consumption value is 0, the node initial utility function value is the initial node profit value recorded in the step (3.2.4).
And (4.2) selecting an action strategy for the M-node and updating the state of the sensor node. The method specifically comprises the following substeps:
(4.2.1) search. Generating a random number, and randomly selecting one of the four strategies of moving upwards, downwards, leftwards and rightwards with a probability epsilon as ai
(4.2.2) use. Selecting a strategy a by considering the combined action of the Q value of the M-node and other sensor node strategies according to the probability 1-epsiloniSatisfy the following requirements
Figure BDA0003402068020000131
Wherein s represents the current state of the sensor node, a-iRepresenting a divide strategy a in a strategy spaceiStrategy other than Num (s, a)-i) Selecting a divide strategy a for adjacent nodesiThe times of other strategies except the strategy, n(s), are the number of undamaged adjacent nodes.
(4.2.3) calculating the utility function R. Utility function value by node profit value P (a)i,a-i) And the repair energy consumption value C (a)i) And (4) jointly determining. The expression is as follows:
R(ai,a-i)=μαP(ai,a-i)-μβC(ai)
wherein R (a)i,a-i) Representing a sensor node selection policy aiUtility value of time, P (a)i,a-i) Selecting policy a for sensor nodeiValue of node profit of time, C (a)i) Selecting policy a for sensor nodeiTimely repair energy consumption value. Mu.sαAnd muβAnd balancing the node profit value and the weight coefficient of the repair energy consumption. The node profit value is determined by the shortest coordinate distance from the sensor node to the target vulnerability repair point, as shown in fig. 2, the repair energy consumption value is determined by the moving distance of the sensor node, and the expression is as follows:
Figure BDA0003402068020000141
Figure BDA0003402068020000142
where ω is a constant, which is an order of magnitude relationship for equalizing the moving distance and energy consumption.
Figure BDA0003402068020000143
And the shortest coordinate distance from the sensor node to the target repairing and reconstructing point is obtained. C is a constant to avoid unnecessary frequent back and forth movement of the sensor node, Δ diIs the distance of movement.
And (4.2.4) node state protection. The state information of the sensor node includes position information, a Q value, a policy value, a utility function value, and the like of the node. When the sensor node executes the strategy, the utility function value is smaller than the utility reward value of the previous state, namely Ri(t)<Ri(t-1), the current action policy is not executed, and the current state s is maintained. Otherwise, the current strategy is executed, and the sensor node enters the next state s'.
And (4.3) learning. And updating the Q value of the state position corresponding to the sensor node, wherein the expression is as follows:
Q(s,ai,a-i)←(1-α)Q(s,ai,a-i)+α[R(s,ai)+γπ(s')]
wherein, Q (s, a)i,a-i) For the sensor node τiSelecting policy a in state siThe Q value of (1).
Figure BDA0003402068020000144
s 'is the next state of the sensor node, Num (s', a)-i) Selecting a divide strategy a for next state neighbor nodesiTimes of other strategies than the above. n (s ') is the number of adjacent nodes with undamaged next state, Q (s', a)i,a-i) For the sensor node τiSelecting policy a under state siThe Q value of (1).
And (4.4) repeating. And (4) repeating the steps (2), (3), (4.2) and (4.3) until the coverage rate meets the requirement or the iteration number reaches the set upper limit.
Fig. 3 shows an example of a visualization result of bug fixes in the embodiment of the present invention, where: FIG. 3(a) is a schematic diagram of initial time overlay; fig. 3(b) is a schematic diagram of the first loophole, where the training time t is 1; fig. 3(c) is a diagram illustrating the training times t-99; fig. 3(d) is a schematic diagram of the second loophole, where the training time t is 100; fig. 3(e) is a diagram of the training times t 199; fig. 3(f) is a schematic diagram of the third loophole, where the training time t is 200; fig. 3(g) is a diagram of training time t-299; fig. 3(h) is a schematic diagram of the fourth loophole, where the training time t is 300;
fig. 3(i) is a diagram illustrating the training number t of 500. As can be seen from FIG. 3, the method avoids unnecessary invalid node movement, and the repairing effect is obvious.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. An Internet of things coverage vulnerability repairing method based on reinforcement learning is characterized by comprising the following steps:
(1) establishing a network model according to the monitoring coverage requirement of a target area;
(1.1) setting a variable range and an estimated root mean square error threshold according to the spatial correlation of a detection target, and performing area sub-grid division on a target coverage area according to the variable range;
(1.2) recording the number i of the sensor nodes according to the distribution of the sensor nodes, and taking the center point of each sub-grid as a reconstruction point and expressing the center point as p according to the divided coverage sub-grids;
(2) establishing a network coverage model through the credible information coverage model, and calculating the coverage rate;
(3) determining directional repair nodes of the vulnerability area by adopting a vulnerability reconstruction point and movable node minimum shortest coordinate distance bidirectional selection method;
(4) and training the directional repair node M-node by adopting a Q-Learning method, and repairing the vulnerability subgrid until the coverage rate meets the requirement or the iteration number reaches a set upper limit.
2. The reinforcement learning-based internet of things coverage hole repairing method according to claim 1, wherein the step (2) specifically comprises the following substeps:
(2.1) in the credible information coverage model, for the space points which are not sampled, calculating the estimated value of the environment variable of the reconstruction point by adopting a common Krigin interpolation function, namely adopting the sensor node tau in the reconstruction neighborhood Z (p)iComputing an environment variable estimate by a weighted average of the measured values of (a); interpolation weight coefficient lambda of sensor node in neighborhoodiSatisfy the requirement of
Figure FDA0003402068010000011
n is the sensor node tau in the reconstruction neighborhood Z (p)iThe number of (2);
(2.2) calculating the root mean square error phi (p) of the reconstruction point p by combining a common kriging interpolation function, wherein the calculation expression is as follows:
Figure FDA0003402068010000012
wherein
Figure FDA0003402068010000013
And μ (p) is solved by steps (2.1.1) and (2.1.2);
(2.3) according to the definition of the credible information coverage model, if phi (p) > epsilon0If the root mean square error is larger than the set coverage threshold, the sub-grid is covered, otherwise, the sub-grid is not covered; recording the number j' of the covered reconstruction point, the error of the root mean square is less than the threshold value epsilon0The vulnerability reconstruction point number of j ";
and (2.4) calculating the coverage rate of the target area.
3. The reinforcement learning-based internet of things coverage hole repairing method according to claim 2, wherein the coverage rate in the step (2.4) is calculated according to the following formula:
Figure FDA0003402068010000021
wherein S is the total area of the covered area, j' is the number of the covered reconstruction point, Sj'For covered reconstruction pointsj' is the area of the sub-grid, and m is the total number of covered reconstruction points.
4. The reinforcement learning-based internet of things coverage vulnerability repair method according to claim 2 or 3, wherein the step (2.1) comprises the following sub-steps:
(2.1.1) interpolation weight coefficient lambdaiObtaining a group of optimal solutions through the minimum kriging variance; introducing a Lagrange multiplier mu (p) to generate a linear Krigin system consisting of n +1 equation sets with n +1 unknowns, and solving to obtain an interpolation weight coefficient lambdai
Figure FDA0003402068010000022
Wherein, γ (τ)ij) And γ (τ)iP) is calculated by a variation function;
(2.1.2) calculating γ (. tau.) in step (2.1.1)ij) And γ (τ)iP); selecting a Gaussian variation function as a variation function of the environment variable for describing the sensor node tauiCollecting spatial correlation between data; the gaussian variation function is formulated as:
Figure FDA0003402068010000031
Figure FDA0003402068010000032
wherein d isτpFor the sensor node τiAnd the euclidean distance of the reconstruction point p,
Figure FDA0003402068010000033
for the sensor node τiAnd τjEuclidean distance of C0And C is a constant.
5. The reinforcement learning-based internet of things coverage vulnerability repair method according to claim 1 or 2, wherein the step (3) specifically comprises the following sub-steps:
(3.1) selecting a fixed node and a movable node from the undamaged sensor nodes; selecting a fixed node for the covered sub-grid, wherein the fixed node does not move any more so as to ensure the coverage of the covered sub-grid; calculating Euclidean distances between the sensor nodes in the range of the reconstruction point of each covered sub-grid and the reconstruction point, selecting the sensor node with the minimum Euclidean distance from the covered reconstruction point as a fixed node of the covered grid and recording the sensor node as an F-node, and recording other sensor nodes as movable nodes and recording the movable nodes as R-nodes;
and (3.2) determining a directional repairing node for each vulnerability reconstruction point.
6. The reinforcement learning-based internet of things coverage hole repairing method according to claim 5, wherein the step (3.2) specifically comprises the following sub-steps:
(3.2.1) calculating the shortest coordinate distance from the R-node to the vulnerability reconstruction point j'; selecting an intention repairing node from the R-node for each vulnerability reconstruction point j' according to the shortest coordinate distance;
(3.2.2) if one R-node is selected as an intention repairing node only by one vulnerability reestablishing point, establishing bidirectional selection, wherein the intention repairing node is a directional repairing node corresponding to the vulnerability reestablishing point; if the same R-node is selected as an intention repairing node by a plurality of vulnerability reconstruction points, the intention repairing node takes the vulnerability reconstruction point with the minimum shortest coordinate distance as a target repairing reconstruction point, and the intention repairing node is a directional repairing node of the selected target repairing reconstruction point; recording a directional repair node as an M-node;
(3.2.3) deleting the directional repairing node selected in the step (3.2.2) from the R-nodes space of the selectable repairing node, and reselecting an intention repairing node or a directional repairing node in the next round by other vulnerability reconstructing points which do not establish bidirectional selection;
(3.2.4) recording the shortest coordinate distance between the directional repair node and the target repair reconstruction point of the directional repair node as an initial node profit value of the directional repair node;
(3.2.5) repeating (3.2.1), (3.2.2), (3.2.3) and (3.2.4) until all vulnerability reconstruction points have selected corresponding directional repair nodes.
7. The reinforcement learning-based internet of things coverage vulnerability repair method according to claim 1 or 2, wherein the step (4) comprises the following sub-steps:
(4.1) initializing Q-Learning training model parameters; setting learning probability alpha and exploration probability epsilon, wherein alpha and epsilon are constants between [0,1 ]; setting the coverage rate requirement and the training times t of a target area, establishing a Q table, and initializing the initial Q value of each sensor node to be 0; the initial restoration energy consumption value is 0, and the node initial utility function value is the initial node income value recorded in the step (3.2.4);
(4.2) selecting an action strategy for the M-node, and updating the state of the sensor node;
(4.3) learning; and updating the Q value of the state position corresponding to the sensor node, wherein the expression is as follows:
Q(s,ai,a-i)←(1-α)Q(s,ai,a-i)+α[R(s,ai)+γπ(s')]
wherein, Q (s, a)i,a-i) For the sensor node τiSelecting policy a in state siQ value of (1);
Figure FDA0003402068010000041
s 'is the next state of the sensor node, Num (s', a)-i) Selecting a divide strategy a for next state neighbor nodesiNumber of other strategies than; n (s ') is the number of adjacent nodes with undamaged next state, Q (s', a)i,a-i) For the sensor node τiSelecting policy a under state siQ value of (1);
(4.4) repeating; and (4) repeating the steps (2), (3), (4.2) and (4.3) until the coverage rate meets the requirement or the iteration number reaches the set upper limit.
8. The reinforcement learning-based internet of things coverage vulnerability repair method according to claim 1 or 2, wherein the step (4.2) specifically comprises the following sub-steps:
(4.2.1) exploring; generating a random number, and randomly selecting one of the four strategies of moving upwards, downwards, leftwards and rightwards with a probability epsilon as ai
(4.2.2) utilization; selecting a strategy a by considering the combined action of the Q value of the M-node and other sensor node strategies according to the probability 1-epsiloniSatisfy the following requirements
Figure FDA0003402068010000051
Wherein s represents the current state of the sensor node, a-iRepresenting a divide strategy a in a strategy spaceiStrategy other than Num (s, a)-i) Selecting a divide strategy a for adjacent nodesiThe times of other strategies except the strategy, n(s), are the number of undamaged adjacent nodes.
(4.2.3) calculating a utility function R; utility function value by node profit value P (a)i,a-i) And the repair energy consumption value C (a)i) Jointly determining; the expression is as follows:
R(ai,a-i)=μαP(ai,a-i)-μβC(ai)
wherein R (a)i,a-i) Representing a sensor node selection policy aiUtility value of time, P (a)i,a-i) Selecting policy a for sensor nodeiValue of node profit of time, C (a)i) Selecting policy a for sensor nodeiTimely repair energy consumption value; mu.sαAnd muβBalancing the node profit value and the weight coefficient of the energy consumption for restoration; the node profit value is determined by the shortest coordinate distance from the sensor node to the target vulnerability repair point, the repair energy consumption value is determined by the moving distance of the sensor node, and the expression is as follows:
Figure FDA0003402068010000052
Figure FDA0003402068010000053
wherein, omega is a constant and is the order of magnitude relation of the balanced moving distance and the energy consumption;
Figure FDA0003402068010000054
the shortest coordinate distance from the sensor node to the target restoration reconstruction point is obtained; c is a constant to avoid unnecessary frequent back and forth movement of the sensor node, Δ diIs the distance of movement;
(4.2.4) node state protection; the state information of the sensor node comprises position information, a Q value, a strategy value and a utility function value of the node; when the sensor node executes the strategy, the utility function value is smaller than the utility reward value of the previous state, namely Ri(t)<Ri(t-1), not executing the current action strategy and still keeping the current state s; otherwise, the current strategy is executed, and the sensor node enters the next state s'.
CN202111502028.1A 2021-12-09 2021-12-09 Internet of things coverage vulnerability repairing method based on reinforcement learning Pending CN114168971A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111502028.1A CN114168971A (en) 2021-12-09 2021-12-09 Internet of things coverage vulnerability repairing method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111502028.1A CN114168971A (en) 2021-12-09 2021-12-09 Internet of things coverage vulnerability repairing method based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN114168971A true CN114168971A (en) 2022-03-11

Family

ID=80485033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111502028.1A Pending CN114168971A (en) 2021-12-09 2021-12-09 Internet of things coverage vulnerability repairing method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN114168971A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116456356A (en) * 2023-03-13 2023-07-18 华中科技大学 Reliability evaluation method for large-scale wireless sensor network based on reliable information coverage
CN116455800A (en) * 2023-03-10 2023-07-18 华中科技大学 Internet of things credibility coverage reliability assessment method based on D-S evidence theory

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116455800A (en) * 2023-03-10 2023-07-18 华中科技大学 Internet of things credibility coverage reliability assessment method based on D-S evidence theory
CN116455800B (en) * 2023-03-10 2024-05-07 华中科技大学 Internet of things credibility coverage reliability assessment method based on D-S evidence theory
CN116456356A (en) * 2023-03-13 2023-07-18 华中科技大学 Reliability evaluation method for large-scale wireless sensor network based on reliable information coverage
CN116456356B (en) * 2023-03-13 2024-02-02 华中科技大学 Reliability evaluation method for large-scale wireless sensor network based on reliable information coverage

Similar Documents

Publication Publication Date Title
CN107341820B (en) A kind of fusion Cuckoo search and the mutation movement method for tracking target of KCF
CN114168971A (en) Internet of things coverage vulnerability repairing method based on reinforcement learning
CN113673565B (en) Multi-sensor GM-PHD self-adaptive sequential fusion multi-target tracking method
CN110675912B (en) Gene regulation and control network construction method based on structure prediction
CN106022471A (en) Wavelet neural network model ship rolling real-time prediction method based on particle swarm optimization algorithm
CN113344973A (en) Target tracking method based on space-time regularization and feature reliability evaluation
CN110543978A (en) Traffic flow data prediction method and device based on wavelet neural network
CN114626307B (en) Distributed consistent target state estimation method based on variational Bayes
CN109800517B (en) Improved reverse modeling method for magnetorheological damper
CN115426661A (en) Credible coverage reliability assessment method for Internet of things based on trust management
CN116609754A (en) Evolutionary intelligent single-mode airborne radar target tracking method
CN116186643B (en) Multi-sensor collaborative target tracking method, system, equipment and medium
CN108845287A (en) The single vector hydrophone coherent source Fast Azimuth estimation method of niche genetic algorithm
CN116933948A (en) Prediction method and system based on improved seagull algorithm and back propagation neural network
CN107197519B (en) Underwater target positioning method based on improved least square support vector machine
KR102110316B1 (en) Method and device for variational interference using neural network
CN115665659A (en) Tensor-based mobile internet of things coverage reliability assessment method
CN112666948B (en) Autonomous underwater vehicle path planning method based on channel modeling
KR20200028801A (en) Learning method and learning device for variational interference using neural network and test method and test device for variational interference using the same
CN114280558A (en) Interference signal waveform optimization method based on reinforcement learning
Chen et al. Neural network for WGDOP approximation and mobile location
CN116455800B (en) Internet of things credibility coverage reliability assessment method based on D-S evidence theory
CN111610490B (en) Sensing node positioning method for filtering RSSI and tabu search clustering
CN113963551B (en) Vehicle positioning method, system, device and medium based on cooperative positioning
CN109740723B (en) Method and system for optimizing anti-interference performance of electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination