CN114786236B

CN114786236B - Method and device for heuristic learning of routing protocol by wireless sensor network

Info

Publication number: CN114786236B
Application number: CN202210450316.5A
Authority: CN
Inventors: 刘智斌; 刘晓峰
Original assignee: Qufu Normal University
Current assignee: Qufu Normal University
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2024-05-31
Anticipated expiration: 2042-04-27
Also published as: CN114786236A

Abstract

The application discloses a method and a device for heuristic learning of a routing protocol of a wireless sensor network, wherein when a next-hop sensor node is selected, the method determines by a method for calculating a target Q value through a pre-constructed routing algorithm which is inspired by a distributed neural network and corresponds to a first sensor node, namely, a two-layer learning structure is constructed, and a discrete sensor node at the bottom layer interacts with a neighbor sensor node to perform reinforcement learning; the high-level construction of the distributed neural network is used for inspiring the bottom learning, so that the neural network fitting model is dynamically constructed in the autonomous learning process of each sensor node, the learning process of each sensor node is inspired in a global angle, the learning efficiency and the learning precision of the node learning are improved, the decision of each sensor node is timely kept up with the change of the network topological structure, and the data packet transmission rate, the node survival proportion and the node energy standard deviation are improved.

Description

Method and device for heuristic learning of routing protocol by wireless sensor network

Technical Field

The application relates to the technical field of wireless sensors, in particular to a method and a device for heuristic learning of routing protocols by a wireless sensor network.

Background

The wireless sensor network is deep into the aspect of life, and has wide application in the fields of agriculture, industry, military, medical treatment, transportation, fishery, unmanned aerial vehicle and the like. The wireless sensor network is composed of a large number of sensors with low cost and low power consumption, is easy to deploy, and the nodes transmit data through short-distance wireless communication. The wireless sensor network continuously detects the environment and sends the data packet to the base station in a single-hop or multi-hop mode. Because the sensor nodes are powered by batteries, the energy is limited, and if the sensor nodes are far away from the base station, the sensor nodes cannot directly send data packets to the base station, and the data packets can only be relayed through other sensor nodes and sent to the base station in a routing mode. However, the path generated by the conventional optimization calculation cannot be used continuously, because the data packets generated by a large number of nodes are transmitted through the same path, which causes serious congestion of the network; in addition, the continued use of one path may cause the nodes on that path to prematurely run out of energy, resulting in premature paralysis of the entire network. Therefore, in the network routing process, the working efficiency of the sensor network is improved, the life cycle of the network is prolonged, and the balance of the electric quantity of the network in work becomes a main target of research.

Network routing protocols are largely divided into two categories: planar routing protocols and hierarchical routing protocols. The nodes of the plane routing protocol are equal to each other, information can be freely forwarded between the nodes, and if an appropriate path is found by adopting an optimization strategy, network energy can be well balanced. But nodes closer to the sink node are heavily loaded and can prematurely run out of energy. The hierarchical routing protocol clusters the nodes, selects a particular node to control the data flow and extends the lifetime of the wireless sensor network. The most typical algorithm of the hierarchical routing protocol is the LEACH algorithm, each cycle of which is divided into: a broadcast phase, a cluster-building phase and a stabilization phase. The method utilizes the cluster head to uniformly receive and process the sensing data packets in the cluster. When the cluster head gathers and forwards the data packet, and the forwarded data packet is larger and the data distance is long, the power consumption of the cluster head can be large, and the node is easy to lose energy and die prematurely.

Therefore, there is a need to find a way to effectively extend the lifetime of wireless sensors.

Disclosure of Invention

The application provides a method and a device for heuristic learning of a routing protocol of a wireless sensor network, which can effectively prolong the service life of a wireless sensor.

The application provides the following scheme:

In a first aspect, a method for heuristically learning a routing protocol by using a wireless sensor network is provided, where the wireless sensor network includes a base station, a first sensor node, and at least one neighbor sensor node that is disposed adjacent to the first sensor node, the first sensor node is any sensor node in the wireless sensor network except for the base station, and the first sensor node transmits a target data packet to the base station through one neighbor sensor node that is disposed adjacent to the first sensor node; the method comprises the following steps:

Calculating at least one target Q value from the first sensor node to any neighbor sensor node according to a pre-constructed routing algorithm which is inspired by a distributed neural network corresponding to the first sensor node;

the at least one target Q value obtained through calculation is arranged in sequence, and the neighbor sensor node corresponding to the maximum target Q value is used as a target sensor node;

The first sensor node sends a target data packet to the target sensor node, wherein the target data packet comprises data to be transmitted sent by a last node of the first sensor node, a node identifier of the first sensor node and the residual energy of the first sensor node at the current moment.

In a preferred embodiment, the method further comprises: constructing a routing algorithm for a distributed neural network heuristic of any sensor node, comprising:

Performing discrete route reinforcement learning based on a Q table based on any sensor node;

constructing a distributed neural network based on all sensor nodes in the wireless sensor network;

And the distributed neural network inspires the discrete route reinforcement learning of any sensor node based on the Q table at the same moment to obtain a routing algorithm inspired by the distributed neural network of the corresponding node.

In a preferred embodiment, the calculating at least one Q value from the first sensor node to any neighboring sensor node at the current moment according to a pre-constructed routing algorithm that is inspired by the distributed neural network corresponding to the first sensor node includes:

Obtaining at least one corresponding first Q value through pre-constructed discrete route reinforcement learning based on a Q table corresponding to the first sensor node according to a reward value r when the first sensor node sends a target data packet to any neighbor sensor node;

Sequencing the at least one first Q value obtained by calculation to obtain a second sensor node with the maximum first Q value obtained by discrete route reinforcement learning based on a Q table;

obtaining a V value of the first sensor node and a V value of the second sensor node based on a pre-constructed distributed neural network, and obtaining a Shaping function based on the V value of the first sensor node and the V value of the second sensor node;

Updating a discrete route reinforcement learning based on a Q table of the first sensor node based on the Shaping function to obtain a routing algorithm inspired by a distributed neural network of the first sensor node;

and calculating at least one target Q value when the first sensor node sends a target data packet to any neighbor sensor node based on a routing algorithm inspired by the distributed neural network of the first sensor node.

In a preferred embodiment, the routing algorithm inspired by the distributed neural network of the first sensor node is:

Wherein μ is a heuristic intensity parameter, and μ∈ (0, 1); alpha is learning rate, and alpha epsilon (0, 1); gamma is a discount factor and gamma e (0, 1); f (i _t,j_t) is a Shaping function.

In a preferred embodiment, the method further comprises: based on the base station and at least two sensor nodes, constructing a dynamically updated weight increment propagation tree according to a preset physical area dividing method, wherein the method comprises the following steps:

Dividing a deployment area where at least two sensor nodes are located into at least two upper layer unit areas according to a preset physical area dividing method to obtain at least two upper layer clusters, and dividing any upper layer unit area into at least two lower layer unit areas according to the preset physical area dividing method to obtain at least two corresponding lower layer clusters;

Randomly determining an initial lower layer cluster head in at least one sensor node included in any lower layer cluster, randomly determining an initial upper layer cluster head in an upper layer cluster corresponding to any upper layer unit area, wherein the initial upper layer cluster comprises at least two initial lower layer cluster heads;

The initial lower layer cluster head periodically receives a first data packet sent by any neighbor sensor node in the lower layer cluster where the initial lower layer cluster head is located and sends a second data packet to the corresponding initial upper layer cluster head, and the initial upper layer cluster head periodically receives the second data packet sent by the initial lower layer cluster head corresponding to the cluster where the initial upper layer cluster head is located and sends a third data packet to the base station; the first data packet, the second data packet and the third data packet respectively comprise node identifiers of corresponding nodes, the node and the current time residual energy of at least one neighbor sensor node;

the initial lower-layer cluster head takes the sensor node with the largest residual electric quantity in the current period in the lower-layer cluster as the lower-layer cluster head of the next period, and the upper-layer cluster head takes the lower-layer cluster head of the next period with the largest residual electric quantity in the upper-layer cluster as the upper-layer cluster head of the next period, so as to form a dynamically updated weight increment propagation tree.

In a preferred embodiment, the method further comprises: transmitting the weight increment of any sensor node in the current period to the base station cluster accumulation through the weight increment propagation tree so as to update the weight of the distributed neural network, wherein the weight increment propagation tree comprises the following steps:

acquiring the weight increment of any sensor node in the current period compared with the distributed neural network in the previous period;

Any sensor node transmits the corresponding neural network weight increment to the corresponding lower cluster head through the weight increment propagation tree of the current period;

The lower-layer cluster head accumulates the received neural network weight increment sent by all the sensor nodes in the cluster to obtain a first accumulated neural network weight increment, and sends the first accumulated neural network weight increment to the corresponding upper-layer cluster head;

The upper layer cluster head accumulates the received first accumulated neural network weight increment sent by all the lower layer cluster heads in the cluster to obtain a second accumulated neural network weight increment, and sends the second accumulated neural network weight increment to a base station;

The base station accumulates all the received weight increment of the second accumulated neural network, obtains the total weight increment of the distributed neural network, and updates the updated weight of the distributed neural network according to the total weight increment;

the base station carries out whole network broadcasting on the updated weight;

any sensor node receives the update weight and updates the distributed neural network.

In a second aspect, a device for heuristic learning of a routing protocol of a wireless sensor network is provided, where the wireless sensor network includes a base station, a first sensor node, and at least one neighbor sensor node that is disposed adjacent to the first sensor node, the first sensor node is any sensor node in the wireless sensor network except for the base station, and the first sensor node transmits a target data packet to the base station through one neighbor sensor node that is disposed adjacent to the first sensor node; the device comprises:

the first processing module is used for calculating at least one target Q value from the first sensor node to any neighbor sensor node according to a pre-constructed routing algorithm inspired by the distributed neural network corresponding to the first sensor node;

The second processing module is used for arranging the at least one target Q value obtained through calculation in sequence, and taking the neighbor sensor node corresponding to the maximum target Q value as a target sensor node;

The third processing module is configured to send, by the first sensor node, a target data packet to the target sensor node, where the target data packet includes data to be transmitted sent by a previous node of the first sensor node, a node identifier of the first sensor node, and remaining energy at a current time of the first sensor node.

In a preferred embodiment, the apparatus further comprises an algorithm construction module for constructing a routing algorithm for a distributed neural network heuristic of any sensor node, comprising:

the first construction unit is used for performing discrete route reinforcement learning based on a Q table based on any sensor node;

The second construction unit is used for constructing a distributed neural network based on all sensor nodes in the wireless sensor network;

the third construction unit is used for the distributed neural network to inspire the discrete route reinforcement learning of any sensor node based on the Q table at the same moment to obtain a routing algorithm inspired by the distributed neural network of the corresponding node.

In a third aspect, there is provided an electronic device comprising:

One or more processors; and

A memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the method of any of the first aspects.

In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, characterized in that the computer program, when being executed by one or more processors, implements the steps of the method according to any of the first aspects.

According to the specific embodiment provided by the application, the application discloses the following technical effects:

The embodiment of the application provides a method and a device for heuristic learning of a routing protocol of a wireless sensor network, wherein the method comprises the steps of calculating at least one target Q value from a first sensor node to any neighbor sensor node at the current moment according to a pre-constructed routing algorithm corresponding to the heuristic of a distributed neural network of the first sensor node; at least one target Q value obtained through calculation is arranged in sequence, and a neighbor sensor node corresponding to the maximum target Q value is used as a target sensor node; the method comprises the steps that a first sensor node sends a target data packet to a target sensor node, wherein the target data packet comprises data to be transmitted sent by a previous node of the first sensor node, a node identifier of the first sensor node and the residual energy of the first sensor node at the current moment, when the next-hop sensor node is selected, the target Q value is determined through a pre-built routing algorithm which is corresponding to a distributed neural network heuristic of the first sensor node, namely, a two-layer learning structure is constructed, and the discrete sensor nodes at the bottom layer interact with the neighbor sensor nodes to perform reinforcement learning; the high-level building of the distributed neural network is used for inspiring the bottom learning, so that a neural network fitting model is dynamically built in the autonomous learning process of each sensor node, the learning process of each node is inspired in a global angle, the learning efficiency and the learning precision of the node learning are improved, the decision of the sensor is timely kept up with the change of the network topological structure, and the data packet transmission rate, the node survival proportion and the node energy standard deviation are improved;

In addition, the method further comprises: constructing a dynamically updated weight increment propagation tree according to a preset physical area dividing method based on the base station and at least two sensor nodes; the weight increment of any sensor node in the current period is accumulated and transmitted to the base station cluster through the weight increment propagation tree so as to update the weight of the distributed neural network; the method comprises the steps of dividing a wireless sensor network into two levels of subspaces according to geographic positions, selecting cluster heads in each subspace, constructing a weight increment propagation tree, converging and accumulating weight increments generated in a sensor learning process to the cluster heads in the area until the cluster heads and a base station at the upper layer, and broadcasting updated weight vectors to all network nodes by the base station, so that the learning task of a neural network is distributed to each node, distributed machine learning is realized, the neural network provides global decisions at a high layer, and a hierarchical and centralized learning mode is adopted for construction; the sensor adopts a plane routing mode, and a local optimization scheme is obtained through reinforcement learning. The result is learned from a high-level neural network, and a global decision trend can be obtained; the local decision information of the nodes can be obtained from the learning result of the bottom layer based on the table, so that the routing performance of the wireless sensor network is further improved, and the network load balance and the electric quantity balance are further achieved rapidly.

Of course, it is not necessary for any of the methods of the present application to be practiced with the advantages described above.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for heuristically learning a routing protocol for a wireless sensor network according to an embodiment of the present application;

FIG. 2 is a further flowchart of a method for heuristically learning routing protocols for a wireless sensor network provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of dynamic clustering of sensor nodes by region in an embodiment of the present application;

FIG. 4 is a block diagram of an algorithm of DNN-HR in an embodiment of the present application;

FIG. 5 is a schematic diagram of updating weights by a distributed neural network through a weight increment propagation tree in an embodiment of the present application;

FIG. 6 is a graph showing the packet transfer success rate over time when DNN-HR takes different μ values;

FIG. 7 is a graph showing the change in the ratio of surviving nodes in the network over time when DNN-HR takes on different μ values;

FIG. 8 is a graph showing the standard deviation of the remaining power of a network node over time when DNN-HR takes different mu values;

FIG. 9 is a graph comparing packet transmission success rate over time for various algorithms;

FIG. 10 is a graph of the ratio of surviving nodes in a network versus time under various algorithms;

FIG. 11 is a graph showing the standard deviation of the remaining power of a network node versus time for various algorithms;

FIG. 12 is a schematic diagram of a system provided by an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the application, fall within the scope of protection of the application.

The following describes in detail the specific implementation scheme provided by the embodiment of the present application.

Examples

Referring to fig. 1, the embodiment provides a method for heuristically learning a routing protocol in a wireless sensor network. The wireless sensor network comprises a base station, a first sensor node and at least one neighbor sensor node arranged adjacent to the first sensor node, wherein the first sensor node is any sensor node except the base station in the wireless sensor network, and the first sensor node transmits a target data packet to the base station through one neighbor sensor node arranged adjacent to the first sensor node. The target data packet comprises data to be transmitted sent by a last node of the first sensor node, a node identifier of the first sensor node and the residual energy of the first sensor node at the current moment. Therefore, in this embodiment, the remaining energy (i.e., the remaining power) of the sensor node is simultaneously transmitted along with the transmission process of the data packet carrying the data to be transmitted by the network, so as to implement the transmission of the data to be transmitted by the sensor node with the highest remaining energy, and improve the survival rate of the nodes in the wireless sensor network.

Specifically, the method comprises the following steps:

S1, calculating at least one target Q value from a first sensor node to any neighbor sensor node according to a pre-constructed routing algorithm which is inspired by a distributed neural network corresponding to the first sensor node.

Specifically, step S1 includes the following sub-steps:

s11, obtaining at least one corresponding first Q value through pre-constructed discrete route reinforcement learning based on a Q table corresponding to a first sensor node according to a reward value r when the first sensor node sends a target data packet to any neighbor sensor node;

S12, sequencing at least one first Q value obtained by calculation to obtain a second sensor node with the maximum first Q value obtained by discrete route reinforcement learning based on a Q table;

S13, obtaining a V value of a first sensor node and a V value of a second sensor node based on a pre-constructed distributed neural network, and obtaining a Shaping function based on the V value of the first sensor node and the V value of the second sensor node;

S14, updating a discrete route reinforcement learning of the first sensor node based on a Q table based on a Shaping function to obtain a routing algorithm inspired by a distributed neural network of the first sensor node;

and S15, calculating at least one target Q value when the first sensor node sends the target data packet to any neighbor sensor node based on a routing algorithm inspired by the distributed neural network of the first sensor node.

Specifically, the routing algorithm inspired by the distributed neural network of the first sensor node is shown in the following formula (1):

Thus, prior to step S1, the method further comprises Sa, constructing a routing algorithm for a distributed neural network heuristic of any sensor node, comprising:

Sa1, performing discrete route reinforcement learning based on a Q table based on any sensor node;

sa2, constructing a distributed neural network based on all sensor nodes in the wireless sensor network;

And Sa3, the distributed neural network inspires the discrete route reinforcement learning of any sensor node at the same moment based on the Q table to obtain a routing algorithm inspired by the distributed neural network of the corresponding node.

It should be noted that, in the routing algorithm inspired by the distributed neural network, in order to make the decision of the sensor keep up with the change of the network topology structure in time so as to improve the learning efficiency and learning accuracy of point learning, the routing algorithm inspired by the constructed distributed neural network is updated in real time according to the learning condition of each node, and a certain network updating channel and method are constructed. Thus, the method further comprises:

sb, constructing a dynamically updated weight increment propagation tree based on a base station and at least two sensor nodes according to a preset physical area dividing method, wherein the method comprises the following steps:

Sb1, dividing a deployment area where at least two sensor nodes are located into at least two upper layer unit areas according to a preset physical area dividing method to obtain at least two upper layer clusters, and dividing any upper layer unit area into at least two lower layer unit areas according to a preset physical area dividing method to obtain at least two corresponding lower layer clusters;

Sb2, randomly determining an initial lower layer cluster head in at least one sensor node included in any lower layer cluster, randomly determining an initial upper layer cluster head in an upper layer cluster corresponding to any upper layer unit area, wherein the initial upper layer cluster comprises at least two initial lower layer cluster heads;

Sb3, periodically receiving a first data packet sent by any neighbor sensor node in the lower layer cluster where the initial lower layer cluster head is located and sending a second data packet to the corresponding initial upper layer cluster head, and periodically receiving a second data packet sent by the initial lower layer cluster head corresponding to the cluster where the initial upper layer cluster head is located and sending a third data packet to the base station by the initial upper layer cluster head; the first data packet, the second data packet and the third data packet respectively comprise node identifiers of corresponding nodes, the node and the current moment residual energy of at least one neighbor sensor node;

and Sb4, using the sensor node with the largest residual electric quantity in the current period in the lower-layer cluster where the sensor node is positioned as the lower-layer cluster head of the next period by the initial lower-layer cluster head, and using the lower-layer cluster head of the next period with the largest residual electric quantity in the upper-layer cluster where the sensor node is positioned as the upper-layer cluster head of the next period by the upper-layer cluster head to form a dynamically updated weight increment propagation tree.

And Sc, transmitting the weight increment of any sensor node in the current period to the base station in a clustering and accumulating way through a weight increment propagation tree so as to update the weight of the distributed neural network, wherein the method comprises the following steps of:

sc1, acquiring weight increment of any sensor node in the current period compared with the distributed neural network in the previous period;

Sc2, any sensor node transmits corresponding neural network weight increment to a corresponding lower-layer cluster head through a weight increment propagation tree of the current period;

Sc3, accumulating the received weight increment of the neural network sent by all the sensor nodes in the cluster by the lower-layer cluster head to obtain a first accumulated weight increment of the neural network, and sending the first accumulated weight increment of the neural network to the corresponding upper-layer cluster head;

sc4, accumulating the received first accumulated neural network weight increment sent by all the lower layer cluster heads in the cluster by the upper layer cluster heads to obtain a second accumulated neural network weight increment, and sending the second accumulated neural network weight increment to the base station;

Sc, the base station accumulates all the received weight increment of the second accumulated neural network, obtains the total weight increment of the distributed neural network, and updates the update weight of the distributed neural network according to the total weight increment;

Sc6, the base station broadcasts the updated weight in the whole network;

and Sc7, any sensor node receives the update weight and updates the distributed neural network.

Therefore, the embodiment realizes distributed machine learning by dispersing the learning task to each node by constructing the weight increment propagation tree as a weight updating channel, and further improves the decision accuracy determination of each sensor node.

After step S1 is completed, the subsequent steps S2 and S3 are performed.

S2, the first sensor node sequentially arranges at least one target Q value obtained through calculation, and the neighbor sensor node corresponding to the maximum target Q value is used as the target sensor node;

s3, the first sensor node sends a target data packet to the target sensor node, wherein the target data packet comprises data to be transmitted sent by the last node of the first sensor node, a node identifier of the first sensor node and the residual energy of the first sensor node at the current moment.

The method for heuristically learning routing protocols for the wireless sensor network will be further described with reference to specific implementation procedures.

The 400 wireless sensors are uniformly and randomly deployed in the range of 2000 m×2000 m, and the sensor node density of the areas needs to be increased appropriately in consideration of the fact that the sensor nodes close to the base station bear a large amount of data forwarding. The initial electric quantity of the node is set to be 1J, the node is static in the working process, each node cannot directly acquire global information, and only can acquire route control information through data transmission with neighbor nodes. The links between nodes are symmetrical, there is no case where node a can receive node B information, but node B cannot.

A large number of sensor nodes are distributed in a fixed area, and a base station in the network area receives information perceived by the sensors, and can broadcast messages to the whole network. The initial electric quantity of the nodes in the wireless sensor network is the same, the physical parameters of the nodes are the same, and the nodes cannot be charged. The base station itself has sufficient computing power, has sufficient power, and is connected to a central server via a high-speed transmission medium. When a sensor node generates or receives a data packet, if the data packet cannot be directly sent to the base station, the data packet can only be relayed by other nodes and is periodically sent to the base station in a multi-hop mode.

In the routing problem of the sensor network, if data are transmitted along paths with the least hops or the shortest distance, the electric energy of nodes on the paths is rapidly exhausted and cannot work continuously, so that the communication of the whole network is blocked, and the whole network cannot work normally. In wireless sensor networks, balancing the use and scheduling of a large amount of node energy at the level of the entire network is a fundamental approach to solving such problems. The routing problem of wireless sensor networks, with a large number of QoS metrics, includes: for the purposes of discussion and verification of the validity of the algorithm, only the life cycle of the network is selected as an index of network routing, and the life cycle as the index of network routing also implies that the number of path hops of the routing is small enough and the routing length is short enough, otherwise, excessive energy is consumed.

In the research process, the power consumption of the node for sending the data packet, the power consumption of the node for receiving the data packet, the residual power of the neighbor node, the size of the data buffer of the receiving node and the currently known minimum hop count from the receiving node to the base station are also considered.

The sensing radius of each sensor node is fixed, and if the sensing radius is exceeded, the nodes cannot communicate. The communication links between nodes are symmetrical, there is no case where node a can pass to node B, but node B cannot. In a WSN, each node knows its own position and can easily perceive the positions of other adjacent nodes through information interaction.

The initial energy of each sensor node is the same, and is recorded as: e _init, in the using process, the residual energy of the sensor node is as follows: e _rem.

Nodes within the communication radius of sensor node n are considered to be its neighbors, all of which constitute a set, noted: neighbor (n), this communication radius is expressed as: r _neighbor. When the sensor node n sends a data packet with a length of k bits to a neighbor node with a distance of d, the sending end and the receiving end consume energy, and the energy consumption formula is as follows (2) (3):

E_Rx(k)＝E_eleck (3)

Where k represents the length of a data packet transmitted or received by a node, d represents the distance from the transmitting node to the receiving node, E _Tx (k, d) represents the power consumption of the node to transmit a packet of length k bits to the receiving node of distance d, and E _Rx (k) represents the power consumption of the node to receive a packet of length k bits. m, E _elec、ε_fs and ε _amp are all constants. E _elec represents the energy expended by the transmit or receive circuitry to process each bit of data, epsilon _fs and epsilon _amp represent the power consumption of the transmit node to transmit 1bit of data over a unit distance by transmitting a wireless signal, and m is the propagation attenuation index. d ₀ is a distance threshold value of the power adjustment of the amplifier, and the calculation formula is shown in the following formula (4):

on this basis, as shown in fig. 2, an exemplary method for heuristically learning a routing protocol by a wireless sensor network includes:

s10, firstly, clustering all wireless sensor nodes according to areas.

A large number of sensor nodes are distributed over a fixed area M x M, the location of the nodes being fixed and energy limited. A base station is installed in the area range, and the base station has infinite energy. And uniformly and fixedly dividing the deployment area of the sensor node into L multiplied by L areas according to the positions by adopting a fixed method, and continuously uniformly dividing each sub-area into L multiplied by L areas according to the positions. This is divided into two layers from the topology of the network: the sub-regions (clusters) of the bottom layer (i.e., the lower layer or the first layer) have (l×l) ², each cluster having a size of (m×m)/(l×l) ², all of which are in the first layer; the sub-areas (clusters) of the upper layer (i.e., the upper layer or the second layer) have l×l numbers, each cluster having a size of (m×m)/(l×l), and all of the clusters are in the second layer. When the node deployment is carried out, the wireless sensor nodes close to the base station bear more forwarding tasks, so the more wireless sensor nodes are deployed.

Randomly selecting cluster heads from the clusters of each layer, recording the cluster heads by each node of the first layer, storing base station information by each cluster head of the second layer, forming a hierarchical structure of a tree of the clusters, and initializing.

The data of the wireless sensor node information is shown in table 1.

TABLE 1

An exemplary algorithm for step S10 is as follows:

s20, generating a dynamic election cluster head and a tree.

Planning all the sensor nodes, constructing clusters and generating a tree structure. All the sensor nodes are leaf nodes in the tree and are positioned at the bottommost layer in the tree, and after the nodes are divided into clusters according to the positions, the node with the largest residual electric quantity is selected as a cluster head in the subarea. But electing a cluster head within a cluster is a time consuming and power consuming task. According to the method, sensor nodes are not required to independently process the election of the cluster heads, in the information communication process of the nodes, the data packets carry the position and electric quantity information of the nodes, and the new cluster heads of the clusters of all levels are determined through interception, conduction and local broadcasting of the cluster heads. Since the hierarchical association of the clusters is determined in S10, the cluster head of the lower layer transmits information to the cluster head of the upper layer cluster, and the cluster head of the upper layer directly transmits information to the base station, a dynamic process of cluster head election and tree generation of the nodes is described as shown in fig. 3. In a specific implementation, an exemplary algorithm employed for the generation of the dynamic election cluster heads and trees is as follows:

Therefore, step S10 has constructed two-level clusters and cluster heads of the wireless sensor network, the bottom node points to the first-level cluster head, the first-level cluster head points to the second-level cluster head, and the second-level cluster head points to the base station. During data transmission, the power consumption of the nodes is unbalanced.

Step S20 describes the dynamic selection process of the cluster heads of each layer in this case. Each node continuously collects data in a certain period T _w 0, then sends a data packet to a designated node according to a routing algorithm, the data packet carries the information of the residual electric quantity of the node, the data packet is simultaneously heard by other neighbor nodes while the data packet is sent, the neighbor nodes judge the node with the largest residual energy in the known area, meanwhile, the node maxEnergyClusterLevel (the maximum electric quantity of the first layer of cluster nodes), the node maxEnergyClusterLevel2 (the maximum electric quantity of the second layer of cluster nodes) and the nodes nodeMaxEnergyClusterLevel1 (the nodes with the maximum electric quantity of the first layer of cluster nodes) and the node nodeMaxEnergyClusterLevel (the nodes with the maximum electric quantity of the second layer of cluster nodes) are updated. In the period T _w or the period T _w, the maximum energy of the nodes in the cluster can be sufficiently transmitted to the current cluster head, when the period is finished, the cluster head broadcasts and announces a new cluster head to all the nodes in the cluster, the lower layer node of the cluster records the cluster head of the round, and simultaneously, maxEnergyClusterLevel (the maximum electric quantity of the first layer of cluster nodes), maxEnergyClusterLevel2 (the maximum electric quantity of the second layer of cluster nodes), nodeMaxEnergyClusterLevel1 (the node with the maximum electric quantity of the first layer of cluster nodes) and nodeMaxEnergyClusterLevel2 (the node with the maximum electric quantity of the second layer of cluster nodes) are emptied, and then a new round of cluster head emergence process is started.

In the period T _w, the acquisition of the node with the largest residual energy in the cluster is calculated in the process of data transmission and response (the process of receiving and transmitting the target data packet), and does not consume extra energy or affect the working efficiency of the wireless sensor network. The period T _w and the period T _w are much larger than T _w, and the amount of data (first, second and third packets) broadcasted in the cluster is small, and the energy consumed by the transmission of the target packet is very small.

Therefore, step S20 dynamically elects the cluster heads of each sub-area of each layer and gradually forms a dynamically updated weight increment propagation tree.

S30, a wireless sensor network routing protocol based on reinforcement learning.

In this embodiment, the data packet is transmitted by adopting a plane routing protocol, and the sensor node performs reinforcement learning according to the feedback reward value while transmitting the data packet. In the reinforcement learning process, a discrete routing learning method based on a table is adopted, so that the flexibility is high, the obtained value function is accurate, but decision is made by using the local value function information, global view is lacking, and along with continuous change of the environment, the optimal decision is difficult to realize quickly, so that the method is more obvious in a large-scale network; the node V value function reflected by the deep neural network has stronger global property, can better reflect the trend and has quicker learning speed, but the flexibility is not enough when learning and deciding by the node V value function. The routing algorithm (Distributed neural network heuristic routing algorithm, DNN-HR) inspired by the distributed neural network combines discrete learning based on a table with continuous inspiring mode based on the neural network, has local flexibility of learning and decision, and grasps the trend under the dynamic environment through learning, thereby achieving the effect of quickly converging to the optimal strategy. The algorithmic structure of this DNN-HR is shown in FIG. 4.

The transferred data packet is regarded as a dynamic agent, and at time t, the position where the data packet is located is regarded as a state s _t, and the state information is composed of two-dimensional coordinates { x, y }. Dynamic agents follow the markov decision process, and the forwarding of a data packet by a node from one node to another node is a transition of state, which transitions to s _t+1. This state transition obtains a certain reward value, which includes:

And a reward value transitionReward for the state transition. If the transfer node is a base station transitionReward =100; transitionReward = -10 if there are no neighbor nodes; in other cases transitionReward =0.

Residual energy of receiving node: REMAININGENERGY.

Transmitting node consumes energy: transmissionPower, the energy consumption of the transmitting node is related to the distance between the transmitting node and the receiving node, and is obtained by the formula (2).

Energy consumption of receiving node: receivePower, the energy consumption is obtained by the formula (3).

Packet queue length of receiving node: queueLength.

Minimum number of hops from receiving node to base station: minHops.

The reward value is defined as:

r＝ω₁*transitionReward+ω₂*remainingEnergy+ω₃*transmissionPower+ω₄*receivePower+ω₅*queueLength+ω₆*minHops(5)

Wherein: omega ₁,ω₂,ω₃,ω₄,ω₅ and omega ₆ are learning parameters.

As defined above, the node where the data packet is located is regarded as a state, and at time t, the node i sends the data packet to the node j, and the calculation formula of the Q value thereof is shown in formula (6):

The iterative update formula of the Q value is defined as the following formula (7):

All Q (i, j) values from node i to node j e neighbor (i) are calculated by equation (6).

The V value of the node i is obtained through the Q value from any node i to the neighbor node thereof, as shown in a formula (8):

DNN-HR during the learning process, each behavior of a sensor node performs a reinforcement learning process of formula (7), and a Q table is stored in each node in a discrete manner. Meanwhile, the sensor network also builds and trains a continuous distributed neural network model, fits a V-value curved surface as a Shaping function, and inspires a discrete Q table based on the table.

The weight updating method of the neural network is shown in the following formula (9):

Wherein, beta is learning rate, beta epsilon (0, 1).

When the agent executes the optimal action, the data packet is transmitted to a node j from the node i, the V values of the node i and the node j are calculated through a neural network and respectively recorded as V _NN (i) and V _NN (j), so that a Shaping function F (i, j) is obtained, the updating of a Q table is participated, and the Shaping function F (i, j) is shown as a formula (10):

F(i，j)＝γV_NN(j)-V_NN(i) (10)

the equation (7) is redefined as shown in equation (11):

The setting F (i, j) can be adjusted according to the weight, and the heuristic intensity is controlled by the parameter μ to obtain the formula (1):

Wherein μ is a heuristic intensity parameter, and μ∈ζ0, 1); alpha is learning rate, and alpha epsilon (0, 1); gamma is a discount factor and gamma e (0, 1); f (i _t,j_t) is a Shaping function.

An exemplary algorithm for step S30 is as follows:

Distributed updating of the distributed neural network is further described below.

Each sensor node in the sensor network is short-looking, and can only interact with the neighbor nodes, so that the network global optimization scheme cannot be perceived, and the problem is particularly obvious in larger networks. Step S30 proposes a decision scheme for performing real-time global heuristic by means of a neural network model, so that a neural network model for centralized management and distributed learning needs to be constructed.

In this embodiment, a distributed neural network is constructed, and nodes perform reinforcement learning based on the neural network when transmitting data packets. The base station has sufficient power to be a neural network parameter server in addition to receiving the data packets. Through base station broadcasting, each sensor node shares the weight of the neural network, and each node can update the weight of the neural network in the reinforcement learning process. The distributed update path of the neural network is provided by a weight increment propagation tree, and the weight update of the neural network is performed from low to high, as shown in fig. 4.

Firstly, the base station broadcasts the weight packet of the neural network, and each sensor node can receive the weight packet, update and save the weight vector in the node according to the weight packet. Then, in the process of data packet transmission, reinforcement learning is also performed, and each node updates the weight of the neural network according to the formula (9), wherein the increment of the weight of the node is delta W. After a certain time interval, each node uploads the weight increment delta W to the first layer cluster head, the first layer cluster head accumulates the weight increments sent by all nodes of the cluster, and then uploads the weight increments to the second layer cluster head to which the node belongs, the second layer cluster head accumulates the weight increments sent by all the first layer cluster heads, and continuously uploads the weight increments to the base station, and the base station accumulates the weight increments sent by all the second layer cluster heads, and updates the weight of the neural network. When the sensor node and the cluster head send the weight increment delta W to the upper layer cluster head or the base station, the delta W item is cleared.

Specifically, the weight updating method of the distributed neural network is shown in formula (9).

An exemplary algorithm for distributed updating of the neural network is as follows:

When the weight increment is calculated, the weight of the target neural network is fixed, so that the reinforcement learning process can be ensured to be converged. After the equal node goes through the iteration of T rounds, the neural network parameters are broadcasted and updated through the base station. And the steps S10 and S20 create a convenient channel for parameter updating in an energy balance mode, so that efficient weight vector updating is conveniently realized through a weight increment propagation tree. Further, a two-layer heuristic reinforcement learning process is implemented through S30.

Therefore, the DNN-HR method provided by the embodiment adopts a layered reinforcement learning method in the control part, adopts plane transmission in the data transmission part, and is a heuristic learning routing protocol suitable for a large-scale wireless sensor network. The algorithm herein was verified in an analog fashion with physical parameter settings as shown in table 2.

TABLE 2

Each node transmits the data packet according to the inherent cycle, the time interval of the node generating the data packet is 15 seconds, and the time interval of the node forwarding the data packet is 100 milliseconds. The method comprises the steps of reinforcement learning based on a Q table, reinforcement learning based on a neural network and dynamic selection of two layers of cluster heads, wherein the method does not consume electric energy of network nodes, the nodes send data packets to appointed neighbor nodes, meanwhile, the neighbor nodes can monitor related information, the neighbor nodes extract information such as node residual energy, V values, neural network weights and coordinates of the nodes from the data packets, and the Q table and the neural network weights are updated in the nodes. The only information to be additionally transmitted is the notification information and weight increment of the node to the cluster head and the two-layer cluster head to the base station, but the information is short and does not carry a data packet; in addition, the time interval for sending the information is much longer than the transmission of the data packet, and the information is sent by the node with the largest energy in the cluster. This approach consumes little energy and does not consume additional node energy compared to conventional planar transport protocols.

Preferably, each data packet automatically acquires the shortest path to the base station during the learning process. There is thus a need to reduce the number of hops and the transmission path length of data packets. When the number of hops for packet transmission is greater than the predetermined number of hops TTL, the packet will be discarded, so that the system needs to reduce the packet discarding rate. When a data packet is sent to a node, the data packet is inserted into a data packet buffer area queue of the node to be queued, and if the data packet buffer area is full, overflow occurs. Because the electric quantity of each sensor node is unbalanced, the node with lower electric quantity can be consumed too early, and the dead point is formed. The dead spot number reaches a certain scale, the communication of the network is blocked, and a large number of data packets cannot be transmitted to the base station. The operation of the algorithm requires the equalization of the electric quantity of the nodes, and the standard deviation of the electric quantity of the network nodes needs to be reduced as much as possible.

Thus, the nodes also maintain a training process of the distributed neural network when performing table-based reinforcement learning. The neural network in training plays a role in inspiring the discrete node training process, and the inspiring two-layer structure can obtain more timely learning results and improve the system performance. The present example sets the heuristic ratio of the neural network to 0, 20%, 40%, 60%, 80% and 100%, and compares the training results. In addition, to verify the effectiveness of the algorithm, the present embodiment compares the DNN-HR algorithm with the Q-routing algorithm, RLBR algorithm, QELAR algorithm.

In the comparison, the evaluation indexes of the algorithm comprise: the rate of successful transmission of data packets (PACKET DELIVERY ratio, PDR) to the base station over time; node survival ratio (Node survival rate, NSR) over time; node energy standard deviation (ENERGYVARIANCE, EV) over time.

The effect of the duty cycle of the neural network representation on the packet transfer success rate (PDR) to take on heuristic functions during machine learning is shown in fig. 6. When μ=0, the system does not use neural network for heuristic, only uses a table-based reinforcement learning approach to update the Q value. When μ=100%, learning was performed using only a distributed neural network. In the learning process from the beginning to 300 seconds, the neural network is completely adopted to fit the V-value method, the success rate of data packet transmission is worst, and the final data packet transmission rate can only reach 50.5%. If the method is not inspired, a learning mode based on a table is completely adopted, mu=0 is set at the moment, and 62.8% of data packets are successfully received by the base station when the network cannot work normally. μ=0 is lower than in other cases, because the network is larger in size, and the Q value update of a node propagates to other nodes of the network more slowly, resulting in a routing decision of the packet that cannot timely reflect the change in the node performance index.

We adjusted the weights of neural network heuristics, and experiments were performed with μ=20%, μ=40%, μ=60% and μ=80% set, respectively, all four cases being significantly better than the first two cases. The reason is that: in larger network environments, through learning, decision trends are obtained quickly by using neural networks, and local strategies obtained by using form-based learning are finer and more accurate. From experiments, it can be determined that the routing algorithm inspired by the neural network can obtain better effect. Wherein, when μ=40%, the system can obtain a higher PDR, reaching 78.4%.

In addition, as the network operates, we demonstrate the proportion of surviving nodes in the network in fig. 7. When μ=100%, the states of the nodes reflected by the neural network are not fine enough, so that many data packets cannot be transmitted in an optimal path, node energy distribution is unbalanced in the routing process, and some nodes die prematurely, so that the service life of the whole network is influenced. The system was running for about 750 seconds, starting with nodes running out of power, becoming dead nodes, not being able to generate data packets, nor being able to deliver data packets, but at this time the network was still operating properly until 2250 seconds, and in connection with fig. 5 we found that no data packets could be transmitted to the base station and the entire network stopped operating, as shown in fig. 7, where the articulation point ratio was 25%.

When μ=0, the larger the scale of the network node is, the harder it is to obtain global information, the node can not obtain timely and accurate decision information, and the node is also dead too early, and after 1050 seconds, the dead node appears; after 2400 seconds, the entire network stops running. As shown in fig. 7, the DNN-HR algorithm better delays the death period of the node and the lifetime of the network, by comparison, when μ=40%, the performance is optimal from the node lifetime point of view. After 1650 seconds, the occurrence of dead nodes is started; until 29850 seconds, the network stops operating normally, and the proportion of the nodes which are still alive at the moment is as follows: 17.1%.

And in order to prolong the life cycle of the sensor nodes and the network, the electric quantity balance of the sensor nodes needs to be achieved in the working process of the network, and one useful index for measuring the node balance is the node energy standard deviation. As shown in FIG. 8, we take different values for μ, with different results. It is evident that when μ=0 and when μ=100%, the standard deviation obtained is large, indicating that the power consumption of the sensor nodes in the network is relatively unbalanced, as also demonstrated in connection with fig. 6 and 7: in both cases, the data transmission rate of the nodes is low and a large number of nodes prematurely end their lifecycle.

The DNN-HR algorithm combining the two conditions of mu=0 and mu=100 can obtain lower energy standard deviation, so that network electric energy distribution is more balanced, node buffer queues are more balanced, and network service life is longer. When μ=40%, its standard deviation is lower than other values of μ.

The initial power of all the sensor nodes is the same, so the standard deviation of the node energy of the network is 0 from the beginning of the network operation. As the energy of the nodes is consumed, the energy of the nodes starts to be unbalanced, the standard deviation reaches the maximum value after the system runs for about 1100 seconds, and then the value of the standard deviation gradually drops under the regulation of a learning algorithm.

The following compares the routing algorithm inspired by the distributed neural network with the Q-routing, QELAR and RLBR algorithms, and the DNN-HR algorithm selects μ=40%.

By comparison, the DNN-HR algorithm has better performance. As shown in FIG. 9, the DNN-HR algorithm has the highest successful packet transmission rate up to 78.4%, while the Q-routing, QELAR and RLBR algorithms reach 55.9%, 62.1% and 67.2%, respectively. From the node and network lifecycle perspective, as shown in FIG. 10, the DNN-HR algorithm appears dead nodes after 1650 seconds, while the Q-routing, QELAR and RLBR algorithms appear dead nodes for approximately 450 seconds, 1200 seconds, and 1350 seconds; the DNN-HR algorithm has failed to transmit data packets to the base station at 2850 seconds and the network lifecycle ends, whereas Q-routing, QELAR and RLBR algorithms end the network lifecycle at about 1950 seconds, 2550 seconds and 2550 seconds. From the standard deviation index, as shown in FIG. 11, the DNN-HR algorithm is significantly better than the other three algorithms.

Q-routing is a more classical reinforcement learning algorithm of sensor routing, and can realize adaptive routing planning, and the algorithm only considers the shortest path and the congestion degree of a data buffer zone of a node, and does not consider an important index of electric energy consumption, so that certain nodes are frequently used and consume the electric energy prematurely. Although the control of the data congestion level can also balance the routing, it is far from sufficient to balance the power. Such as: uncongested links are frequently scheduled and node power is prematurely exhausted. QELAR is a common plane routing algorithm, which considers the residual electric quantity of a transmitting node and a receiving node, and also considers the difference between the residual energy of the node and the average electric quantity of the local area where the current node is located, so that the electric energy balance of the network node is greatly improved. RLBR algorithm takes into account the remaining power of the neighbor node, the distance from the node to the neighbor node, and the number of hops from the neighbor node to the base station. The algorithm considers both the shortest path and the power loss. The form of the reward value adopts the form of product, and the other three algorithms adopt the form of addition, wherein RLBR algorithm counts the hop count of the node reaching the base station into the reward value, thus obviously improving the learning efficiency.

The performance metrics considered by the DNN-HR algorithm include: the method comprises the steps of remaining energy of a neighbor node, energy consumption of data sent by the node, energy consumption of data received by the node, minimum hop count from the neighbor node to a base station and length of a node data buffer queue. These metrics are constantly changing during the network operation, especially the electricity of the nodes is constantly consumed, which causes the network topology to be constantly changing. Fig. 9 to 11 show: the DNN-HR algorithm can better adapt to the change of network environment, and is greatly improved compared with other three algorithms. The DNN-HR algorithm selects the cluster head, and uploads the weight increment and other operations, so that the energy consumption is small compared with the energy consumption for transmitting the data packet, and can be ignored; at the same time, this algorithm does not cause an increase in time costs.

In summary, in the method for heuristic learning of the routing protocol of the wireless sensor network provided by the embodiment, when the next-hop sensor node is selected, the determination is performed by a method for calculating the target Q value by a pre-constructed routing algorithm which is inspired by the distributed neural network and corresponds to the first sensor node, namely, by constructing a two-layer learning structure, the discrete sensor node at the bottom layer interacts with the neighbor sensor node, and reinforcement learning is performed; the high-level building of the distributed neural network is used for inspiring the bottom learning, so that a neural network fitting model is dynamically built in the autonomous learning process of each sensor node, the learning process of each node is inspired in a global angle, the learning efficiency and the learning precision of the node learning are improved, the decision of the sensor is timely kept up with the change of the network topological structure, and the data packet transmission rate, the node survival proportion and the node energy standard deviation are improved;

In addition, the method includes dividing a wireless sensor network into two levels of subspaces according to geographic positions, selecting cluster heads in each subspace, constructing a weight increment propagation tree, converging and accumulating weight increments generated in a sensor learning process to cluster heads in the area until the cluster heads and base stations at the upper layer, and broadcasting updated weight vectors to all network nodes by the base stations, so that learning tasks of a neural network are distributed to all nodes, distributed machine learning is realized, the neural network provides global decisions at a high layer, and a hierarchical and centralized learning mode is adopted for construction; the sensor adopts a plane routing mode, and a local optimization scheme is obtained through reinforcement learning. The result is learned from a high-level neural network, and a global decision trend can be obtained; the local decision information of the nodes can be obtained from the learning result of the bottom layer based on the table, so that the routing performance of the wireless sensor network is further improved, and the network load balance and the electric quantity balance are further achieved rapidly.

Corresponding to the method for heuristically learning the routing protocol in the wireless sensor network, the embodiment also provides a device for heuristically learning the routing protocol in the wireless sensor network, which specifically may include:

Preferably, the apparatus further includes an algorithm construction module for constructing a routing algorithm for a distributed neural network heuristic of any sensor node, including:

Preferably, the first processing module includes:

The first processing unit is used for obtaining at least one corresponding first Q value through pre-constructed discrete route reinforcement learning based on a Q table corresponding to the first sensor node according to a reward value r when the first sensor node sends a target data packet to any neighbor sensor node;

The sorting unit is used for sorting the at least one first Q value obtained by calculation to obtain a second sensor node with the largest first Q value obtained by discrete route reinforcement learning based on a Q table;

The second processing unit is used for obtaining the V value of the first sensor node and the V value of the second sensor node based on a pre-constructed distributed neural network, and obtaining a Shaping function based on the V value of the first sensor node and the V value of the second sensor node;

the third processing unit is used for updating the discrete route reinforcement learning based on the Q table of the first sensor node based on the Shaping function to obtain a routing algorithm inspired by the distributed neural network of the first sensor node;

And the fourth processing unit is used for calculating at least one target Q value when the first sensor node sends the target data packet to any neighbor sensor node based on a routing algorithm inspired by the distributed neural network of the first sensor node.

Preferably, the routing algorithm inspired by the distributed neural network of the first sensor node is:

Preferably, the device includes a weight increment propagation tree construction unit, configured to construct a dynamically updated weight increment propagation tree according to a preset physical area division method based on the base station and at least two sensor nodes, including:

A dividing unit for dividing the deployment area where the at least two sensor nodes are located into at least two upper layer unit areas according to a preset physical area dividing method to obtain at least two upper layer clusters, dividing any upper layer unit area into at least two lower layer unit areas according to a preset physical area dividing method to obtain at least two corresponding lower layer clusters;

a determining unit, configured to randomly determine an initial lower layer cluster head in at least one sensor node included in any one of the lower layer clusters, and randomly determine an initial upper layer cluster head in an upper layer cluster corresponding to any one of the upper layer unit areas, where the initial upper layer cluster includes at least two initial lower layer cluster heads;

The fifth processing unit is configured to periodically receive, by the initial lower-layer cluster head, a first data packet sent by any neighboring sensor node in the lower-layer cluster where the initial lower-layer cluster head is located and send a second data packet to the initial upper-layer cluster head corresponding to the initial lower-layer cluster head, where the initial upper-layer cluster head periodically receives, by the initial upper-layer cluster head, a second data packet sent by the initial lower-layer cluster head corresponding to the cluster where the initial upper-layer cluster head is located and send a third data packet to the base station; the first data packet, the second data packet and the third data packet respectively comprise node identifiers of corresponding nodes, the node and the current time residual energy of at least one neighbor sensor node;

And the fourth construction unit is used for the initial lower-layer cluster head to take the sensor node with the largest residual electric quantity in the current period in the lower-layer cluster where the sensor node is positioned as the lower-layer cluster head of the next period, and the upper-layer cluster head takes the lower-layer cluster head of the next period with the largest residual electric quantity in the upper-layer cluster where the sensor node is positioned as the upper-layer cluster head of the next period, so as to form a dynamically updated weight increment propagation tree.

Preferably, the apparatus further includes an updating module, configured to cluster and accumulate and transmit, to the base station, a weight increment of any sensor node in a current period through the weight increment propagation tree, so as to update a weight of the distributed neural network, where the updating module includes:

The acquisition unit is used for acquiring the weight increment of any sensor node in the current period compared with the distributed neural network in the previous period;

The sending unit is used for sending the corresponding neural network weight increment to the corresponding lower-layer cluster head through the weight increment propagation tree of the current period by any sensor node;

The first calculation unit is used for accumulating the received neural network weight increment sent by all the sensor nodes in the cluster by the lower-layer cluster head to obtain a first accumulated neural network weight increment, and sending the first accumulated neural network weight increment to the corresponding upper-layer cluster head;

The second calculation unit is used for accumulating the received first accumulated neural network weight increment sent by all the lower layer cluster heads in the cluster to obtain a second accumulated neural network weight increment, and sending the second accumulated neural network weight increment to the base station;

the third calculation unit is used for accumulating all the received weight increment of the second accumulated neural network by the base station, obtaining the total weight increment of the distributed neural network, and updating the updated weight of the distributed neural network according to the total weight increment;

The broadcasting unit is used for the base station to carry out whole network broadcasting on the updated weight;

and the updating unit is used for any sensor node to receive the updating weight and update the distributed neural network.

For the undescribed part of the device for heuristically learning the routing protocol in the wireless sensor network in the embodiment, reference may be made to the description of the method for heuristically learning the routing protocol in the wireless sensor network, which is not repeated here.

It should be noted that, when the heuristic learning routing protocol service of the wireless sensor network is triggered, the heuristic learning routing protocol device of the wireless sensor network provided by the embodiment is only exemplified by the division of the functional modules, and in practical application, the functional allocation can be completed by different functional modules according to the need, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the device for heuristic learning of the routing protocol of the wireless sensor network provided in this embodiment belongs to the same concept as the embodiment of the method for heuristic learning of the routing protocol of the wireless sensor network, that is, the device is based on the method, and the detailed implementation process of the device is detailed in the method embodiment and will not be repeated here.

In addition, the embodiment of the application also provides electronic equipment, which comprises:

One or more processors; and

A memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the operations of:

FIG. 12 illustrates an architecture of a computer system 1500, which may include a processor 1510, a video display adapter 1511, a disk drive 1512, an input/output interface 1513, a network interface 1514, and a memory 1520, among others. The processor 1510, the video display adapter 1511, the disk drive 1512, the input/output interface 1513, the network interface 1514, and the memory 1520 may be communicatively connected by a communication bus 1530.

The processor 1510 may be implemented by a general-purpose CXU (Central Xrocessing Unit, central processing unit), a microprocessor, an application specific integrated circuit (AXXlication SXecific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for executing related programs to implement the technical solutions provided by the present application.

The Memory 1520 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage, dynamic storage, or the like. The memory 1520 may store an operating system 1521 for controlling the operation of the computer system 1500, a Basic Input Output System (BIOS) 1522 for controlling the low-level operation of the computer system 1500. In addition, a web browser 1523, a data storage management system 1524, an icon font processing system 1525, and the like may also be stored. The icon font processing system 1525 may be an application program that implements the operations of the foregoing steps in the embodiment of the present application. In general, when the present application is implemented in software or firmware, the relevant program code is stored in the memory 1520 and executed by the processor 1510.

The input/output interface 1513 is used to connect input/output devices to enable information input and output. The input/output devices may be configured as components in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

The network interface 1514 is used to connect network devices (not shown) to enable communication interactions of the device with other devices. The network device may communicate through a wired manner (such as USB, network cable, etc.), or may communicate through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 1530 includes a path for transporting information between various components of the device (e.g., processor 1510, video display adapter 1511, disk drive 1512, input/output interface 1513, network interface 1514, and memory 1520).

In addition, the computer system 1500 may also obtain information of specific retrieval conditions from the virtual resource object retrieval condition information database for making condition judgment, and so on.

It is noted that although the above devices illustrate only the processor 1510, video display adapter 1511, disk drive 1512, input/output interface 1513, network interface 1514, memory 1520, bus 1530, etc., the device may include other components necessary to achieve proper functioning in a particular implementation. Furthermore, it will be appreciated by those skilled in the art that the apparatus may include only the components necessary to implement the present application, and not all of the components shown in the drawings.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a cloud server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden from the data of their inventive step.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The method for heuristically learning the routing protocol by the wireless sensor network is characterized in that the wireless sensor network comprises a base station, a first sensor node and at least one neighbor sensor node arranged adjacent to the first sensor node, wherein the first sensor node is any sensor node except the base station in the wireless sensor network, and the first sensor node transmits a target data packet to the base station through one neighbor sensor node arranged adjacent to the first sensor node; the method comprises the following steps:

according to a pre-constructed routing algorithm which is inspired by a distributed neural network corresponding to the first sensor node, calculating at least one target Q value from the first sensor node to any neighbor sensor node at the current moment, wherein the method comprises the following steps:

calculating at least one target Q value when the first sensor node sends a target data packet to any neighbor sensor node based on a routing algorithm inspired by a distributed neural network of the first sensor node;

the routing algorithm inspired by the distributed neural network of the first sensor node is as follows:

Wherein μ is a heuristic intensity parameter, and μ∈ (0, 1); alpha is learning rate, and alpha epsilon (0, 1); gamma is a discount factor and gamma e (0, 1); fi _t,j_t is a Shaping function;

2. The method of claim 1, wherein the method further comprises: constructing a routing algorithm for a distributed neural network heuristic of any sensor node, comprising:

3. The method of claim 1 or 2, wherein the method further comprises: based on the base station and at least two sensor nodes, constructing a dynamically updated weight increment propagation tree according to a preset physical area dividing method, wherein the method comprises the following steps:

4. A method as claimed in claim 3, wherein the method further comprises: transmitting the weight increment of any sensor node in the current period to the base station cluster accumulation through the weight increment propagation tree so as to update the weight of the distributed neural network, wherein the weight increment propagation tree comprises the following steps:

the base station carries out whole network broadcasting on the updated weight;

5. The heuristic learning routing protocol device of the wireless sensor network is characterized in that the wireless sensor network comprises a base station, a first sensor node and at least one neighbor sensor node which is arranged adjacent to the first sensor node, wherein the first sensor node is any sensor node except the base station in the wireless sensor network, and the first sensor node transmits a target data packet to the base station through one neighbor sensor node which is arranged adjacent to the first sensor node; the device comprises:

The first processing module is configured to calculate at least one target Q value from the first sensor node to any neighboring sensor node thereof at the current moment according to a pre-constructed routing algorithm inspired by the distributed neural network corresponding to the first sensor node, and includes:

Wherein μ is a heuristic intensity parameter, and μ∈ (0, 1); alpha is learning rate, and alpha epsilon (0, 1); gamma is a discount factor and gamma e (0, 1); f (i _t,j_t) is a Shaping function;

6. The apparatus of claim 5, further comprising an algorithm construction module for constructing a routing algorithm for a distributed neural network heuristic of any sensor node, comprising:

7. An electronic device, comprising:

One or more processors; and

A memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the method of any of claims 1-4.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by one or more processors, implements the steps of the method of any of claims 1 to 4.