CN113747384B - Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism - Google Patents
Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism Download PDFInfo
- Publication number
- CN113747384B CN113747384B CN202110920967.1A CN202110920967A CN113747384B CN 113747384 B CN113747384 B CN 113747384B CN 202110920967 A CN202110920967 A CN 202110920967A CN 113747384 B CN113747384 B CN 113747384B
- Authority
- CN
- China
- Prior art keywords
- sensing module
- sensor
- energy
- data transmission
- throughput
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 15
- 230000007246 mechanism Effects 0.000 title abstract description 7
- 230000005540 biological transmission Effects 0.000 claims abstract description 74
- 238000005265 energy consumption Methods 0.000 claims abstract description 45
- 238000005457 optimization Methods 0.000 claims abstract description 30
- 230000007704 transition Effects 0.000 claims abstract description 20
- 238000000034 method Methods 0.000 claims description 27
- 238000003860 storage Methods 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 7
- 230000000153 supplemental effect Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 238000004590 computer program Methods 0.000 description 6
- 239000003990 capacitor Substances 0.000 description 5
- 238000011084 recovery Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003306 harvesting Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 238000011268 retreatment Methods 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses an industrial Internet of things energy sustainability decision mechanism based on deep reinforcement learning, which comprises the following steps: establishing a sensor wireless local area network; establishing a state transition model based on a Markov chain according to the data transmission collision probability of the sensor to obtain the data transmission probability of the sensor; establishing an energy consumption model according to the data receiving power of the sensor; establishing a throughput optimization model according to the throughput of the sensor; optimizing the sensor wireless local area network according to the data transmission probability, the energy consumption model and the throughput optimization model of the sensor to obtain an energy sustainable network; and obtaining a competition window of the intelligent sensor through energy sustainable network output. The embodiment of the invention improves the throughput of the system through the throughput optimization model, and can be widely applied to the technical field of the Internet of things.
Description
Technical Field
The invention relates to the technical field of Internet of things, in particular to an industrial Internet of things energy sustainability decision mechanism based on deep reinforcement learning.
Background
A large number of sensors are deployed in the industrial Internet of things, and form a wireless sensor network through an IEEE 802.11ax protocol so as to monitor data of various intelligent devices in real time. Most of these sensors are powered by batteries, are deployed in inaccessible locations, and some sensors have some mobility, so it is impractical to replace batteries for these sensors. Such sensors may draw power from the charging dock or the external environment by way of wireless charging or power Harvesting (Energy Harvesting). Because a single sensor has limited energy, the energy consumption of the sensor can be optimized by controlling the frequency of data transmission, and the transmission throughput in the local area network is maximized.
Considering the problem of collision which may occur during transmission of users, the IEEE 802.11ax protocol often adopts a conventional binary back-off algorithm, and users may randomly wait for a period of time before retransmitting when collision occurs. The random time is selected according to a Contention Window (CW) value, and a large CW can avoid collision but delay the sending time of data and reduce the throughput; a small CW allows the user to retransmit the data quickly, but increases the probability of collision.
In summary, how to adjust the size of the sensor collision window to maximize the transmission throughput in the local area network is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In view of this, the embodiment of the present invention provides an industrial internet of things energy sustainability decision mechanism based on deep reinforcement learning, so as to improve the transmission throughput of the system.
In one aspect, the invention provides an industrial internet of things energy sustainability decision mechanism based on deep reinforcement learning, which comprises:
establishing a sensor wireless local area network, wherein the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
establishing an energy consumption model according to the data receiving power of the sensing module;
establishing a throughput optimization model according to the throughput of the sensing module, wherein the throughput is used for representing the data volume of a data packet sent by the sensing module within a certain time;
optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network;
and obtaining a competition window of the intelligent sensor through the energy sustainable network output.
Optionally, the establishing a state transition model based on a markov chain according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module includes:
determining the data transmission collision probability of the sensing module when the sensing module transmits data in the sensor wireless local area network;
simulating a data transmission collision process of the sensing module by combining the data transmission collision probability and the discrete time Markov chain, and determining a state transition model based on the Markov chain;
and carrying out normalization condition processing on the state transition model of the Markov chain to obtain the data transmission probability of the sensing module.
Optionally, the establishing an energy consumption model according to the data receiving power of the sensing module includes:
calculating to obtain a signal-to-noise ratio threshold value of the sensing module according to the receiving power of the sensing module;
calculating the energy consumption required by the sensing module for successful transmission according to the signal-to-noise ratio threshold;
and establishing the energy consumption model according to the energy consumption.
Optionally, the optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model to obtain an energy sustainability system includes:
building an optimized neural network for the intelligent sensor;
calculating to obtain the residual energy of the intelligent sensor according to the energy consumption model;
and inputting the data transmission probability and the residual energy into the optimized neural network to obtain an energy sustainable network.
Optionally, the obtaining a contention window of the smart sensor through the energy sustainable network output includes:
initializing the energy sustainable network, and determining an initialization system;
randomly generating an initial competition window for the intelligent sensor in the initialization system;
and updating the environment reward of the initial competition window and outputting a target competition window.
Optionally, the establishing a throughput optimization model according to the throughput of the sensing module includes:
the optimization model is as follows:
s.t. C1:n dead ≤0,
wherein d is distance, t is time, alpha t Expressed as a discount factor, eta t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions, CE, of the smart sensor min Expressed as minimum supplemental energy, CE, of the sensing module j Expressed as the supplemental energy, CE, of the jth sensor in the sensing module max Maximum supplementary energy for the sensing module, d min Represents the minimum distance between the sensing module and the gateway, d j Represents the distance between the jth sensor in the sensing module and the gateway, d max The maximum distance between the sensing module and the gateway is represented, j represents a variable, and n represents the number of sensors in the sensing module.
Optionally, the updating the environmental reward to the initial contention window and outputting the target contention window include:
the environment award r t Expressed as:
wherein eta t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions of the smart sensor.
On the other hand, the embodiment of the invention also discloses an industrial internet of things energy sustainability decision-making system based on deep reinforcement learning, which comprises the following steps:
the system comprises a first unit, a second unit and a third unit, wherein the first unit is used for establishing a sensor wireless local area network, the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
the second unit is used for establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
the third unit is used for establishing an energy consumption model according to the data receiving power of the sensing module;
a fourth unit, configured to establish a throughput optimization model according to throughput of the sensing module, where the throughput is used to characterize a data volume of a data packet sent by the sensing module within a certain time;
a fifth unit, configured to optimize the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model, so as to obtain an energy sustainable network;
and the sixth unit is used for obtaining a competition window of the intelligent sensor through the energy sustainable network output.
On the other hand, the embodiment of the invention also discloses an electronic device, which comprises a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
In another aspect, an embodiment of the present invention further discloses a computer-readable storage medium, where the storage medium stores a program, and the program is executed by a processor to implement the method described above.
In another aspect, an embodiment of the present invention further discloses a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.
Compared with the prior art, the technical scheme adopted by the invention has the following technical effects: according to the invention, a Markov chain-based state transition model is established according to the data transmission collision probability of the sensing module, so that the data transmission probability of the sensing module is obtained; establishing an energy consumption model according to the data receiving power of the sensing module; establishing a throughput optimization model according to the throughput of the sensing module; optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network; the throughput of the smart sensor can be improved without power interruption.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a detailed flow chart of an embodiment of the present invention;
fig. 2 is a sensor wireless local area network topology diagram according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The embodiment of the invention discloses an industrial Internet of things energy sustainability decision mechanism based on deep reinforcement learning, which comprises the following steps:
s1, establishing a sensor wireless local area network, wherein the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
s2, establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
s3, establishing an energy consumption model according to the data receiving power of the sensing module;
s4, establishing a throughput optimization model according to the throughput of the sensing module, wherein the throughput is used for representing the data volume of a data packet sent by the sensing module within a certain time;
s5, optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network;
and S6, obtaining a competition window of the intelligent sensor through the energy sustainable network output.
Referring to fig. 2, a wireless local area network of sensors is established by a gateway 1 and a plurality of wireless sensors connected with the gateway under IEEE 802.11ax protocol, the wireless sensors include an intelligent sensor 3 and a plurality of ordinary sensors 2, and the signal transmission of the wireless sensors obtains electric energy from the surrounding environment to supplement energy. All sensors can only communicate over the network via the gateway 1, they cannot communicate directly with each other, and only process one packet at a time. There is a time-varying wireless channel between the gateway 1 and all sensors, the channel coefficient is expressed as h = { h i I belongs to n, wherein the channel coefficient between the gateway 1 and the intelligent sensor 3 in the tth period is represented as h n,t The channel is constant during each period of length T. Further, b (t) (0. Ltoreq. B (t). Ltoreq.m) is used to represent the number of retreats at time t, where m is the mostThe large backoff number is decreased by one; s (t) is used to represent the random course of the sensor in the back-off phase (0, 1, \8230;, m) at a certain time t. The normal sensor 2 updates the size of the competition window in a random mode, and the intelligent sensor 3 dynamically selects the optimal competition window through deep reinforcement learning and environment interaction.
Further as a preferred embodiment, in the step S2, establishing a state transition model based on a markov chain according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module includes:
determining the data transmission collision probability of the sensing module when the sensing module transmits data in the sensor wireless local area network:
simulating a data transmission collision process of the sensing module by combining the data transmission collision probability and the discrete time Markov chain, and determining a state transition model based on the Markov chain;
and carrying out normalization condition processing on the state transition model of the Markov chain to obtain the data transmission probability of the sensing module.
When each stage of sensor wireless network data transmission is started, a sensor randomly selects P as data transmission collision probability in a certain range, and the P is used for representing the collision probability of a data packet transmitted on a channel. In each data transmission attempt, packets sent from the sensor collide with a constant and independent probability P, and a two-dimensional process { s (t), b (t) } is simulated using a discrete-time markov chain in which a non-zero one-step transition probability P can be expressed as follows:
in the formula, i is a variable s (t), k is a variable b (t), m is the maximum number of retreatments minus one, and W 0 Initializing a contention window for the smart sensor, W i Is the competition window of the ith ordinary sensor, W m Compete for the window for the smart sensor.
When the system tends to be stable, use b i,k =lim t→∞ P{s(t)=i,b(t)=k}k∈[0,W i -1],i∈[0,m]Representing a smooth distribution of markov chains. The markov chain closure is demonstrated as follows:
b i-1,0 ·p=b i,0 →b i,0 =p i b 0,0 0<i<m;
in the formula, b i-1,0 Denotes s (t) = i-1, b (t) =0}, p denotes a conditional collision probability, b denotes a conditional collision probability, and i,0 denotes { s (t) = i, b (t) =0}, b m-1,0 Represents { s (t) = m-1, b (t) =0}, b m,0 Denotes { s (t) = m, b (t) =0}, b 0,0 Denotes s (t) =0, b (t) =0, i denotes a variable, and m denotes the maximum number of steps down by one.
Further, it is possible to obtain:
simplifying to obtain:
the markov chain is subjected to a normalization condition, which is simplified as follows:
namely:
let τ denote the probability of transmission of a sensor during a randomly selected time period, any transmission occurring when the back-off time counter equals zero, the data transmission probability τ of the sensor being:
further preferably, in step S3, the establishing an energy consumption model according to the data received power of the sensing module includes:
calculating to obtain a signal-to-noise ratio threshold value of the sensing module according to the receiving power of the sensing module;
calculating energy consumption required by the sensing module for successful transmission according to the signal-to-noise ratio threshold;
and establishing the energy consumption model according to the energy consumption.
Wherein the sensor stores the collected energy in a super capacitor for data transmission, and the energy capacity of the super capacitor is represented as C max . Considering a general energy collection model, we assume that in the t period, the sensor supplements energy to the super capacitor through energy recovery to be CE t Joules/millisecond. The energy collected by the sensor at each stage is time-varying due to the mobility of the sensor and the changes in the transmission environment. In addition, we assume that the receiving end of all devices has a mean value of zero and a variance σ 2 Equal additive white gaussian noise. To ensure that the signal sent by the sensor can be successfully captured by the gateway, the signal-to-noise ratio (SNR) received at the gateway should be greater than the capture threshold. Therefore, the minimum received power is P under the condition that the probability of sensor data transmission is τ 0 The lowest threshold ζ of the received signal-to-noise ratio at the gateway is:
in the formula, h is the channel coefficient, P 0 Is minimum received power, σ 2 Is gaussian white noise.
Therefore, successful transmission by the sensor during the t period is consumedMinimum energy E of 0 Comprises the following steps:
where, ζ is the lowest threshold of signal-to-noise ratio, σ 2 Is Gaussian white noise, Δ T is expressed as the time required for packet transmission, h t Is the channel coefficient over the t period.
In order to realize sustainable utilization of energy, the energy consumption of all the sensors is E when the sensors transmit in the uplink 0 . Suppose that during the t period, the sensor transmits z in total t Each data packet, after the period of time t is over, the energy consumption modelComprises the following steps:
in the formula, E t The electric quantity of the super capacitor before the beginning of the t period, z t Number of data packets to be transmitted in time period t, E 0 For minimum energy consumption, CE t For energy recovery to replenish the capacitor during a period T, T is the time interval of the period T.
Further as a preferred embodiment, in the step S5, optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model to obtain an energy sustainability system, includes:
building an optimized neural network for the intelligent sensor;
calculating to obtain the residual energy of the intelligent sensor according to the energy consumption model;
and inputting the data transmission probability and the residual energy into the optimized neural network to obtain an energy sustainable network.
The method comprises the following steps that four neural networks are built for an intelligent sensor, wherein the four neural networks are respectively execution strategy networks used for selecting competition window values; an execution evaluation network for evaluating the contention window value; a target policy network for stable training and providing a contention window value for performing an update of the evaluation network and a target evaluation network for providing a next value for performing an update of the evaluation network. Under the condition that the data transmission probability is determined, the residual energy of the intelligent sensor is obtained through calculation of an energy consumption model, and the channel state information, the distance moving condition, the energy recovery and the residual energy are input into an execution strategy network as observation to obtain an optimized wireless local area network, namely an energy sustainable network.
Further preferably, in step S6, obtaining the contention window of the smart sensor through the energy sustainable network output includes:
initializing the energy sustainable network, and determining an initialization system;
randomly generating an initial competition window for the intelligent sensor in the initialization system;
and updating the environment reward of the initial competition window and outputting a target competition window.
Wherein, the experience in the energy sustainable network is put back to the pool, the competition window of the common sensor and the systematized environment are initialized. The method comprises the steps that the size of an initial competition window is independently and randomly set in a certain range by a common sensor, and a competition window value of an intelligent sensor in an initialization system is randomly generated. According to the energy consumption model, the energy consumption E can be obtained when a single data packet is transmitted in the uplink of all the sensors 0 Under the condition (2), the energy can be continuously utilized. At the moment, all sensors transmit data packets according to an SEH-CSMA/CA protocol, and after a certain period of time, the number of the data packets successfully transmitted by the intelligent sensors and the energy interruption condition are counted, and the throughput of the intelligent sensors at the stage is calculated. The currently obtained reward r is calculated according to a reward formula. Meanwhile, the intelligent sensor calculates a loss function of the execution evaluation network by using the mean square error, and then updates the execution evaluation network in a reverse gradient transfer mode; at the same time, willThe states of all the sensors and the competition window values of the intelligent sensors at the current moment are input into the execution evaluation network to obtain state-action values, and the execution strategy network is updated by using the values in a reverse gradient transmission mode. In addition, the target strategy network and the target evaluation network are gradually updated in a mode of copying a certain proportion at each step, and the intelligent sensor can make adaptive adjustment according to the change of the dynamic condition by repeatedly iterating the processes to obtain the size of a competition window of the intelligent sensor.
Further as a preferred embodiment, establishing a throughput optimization model according to the throughput of the sensing module includes:
the optimization model is as follows:
s.t. C1:n dead ≤0,
wherein d is distance, t is time, alpha t Expressed as a discount factor, eta t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions, CE, of the smart sensor min Expressed as minimum supplemental energy, CE, of the sensing module j Expressed as the supplemental energy, CE, of the jth sensor in the sensing module max Maximum supplementary energy for the sensing module, d min Indicating the minimum distance between the sensing module and the gateway, d j Represents the distance between the jth sensor in the sensing module and the gateway, d max The maximum distance between the sensing module and the gateway is represented, j represents a variable, and n represents the number of sensors in the sensing module.
Further as a preferred embodiment, updating the environment reward for the initial contention window, and outputting a target contention window, includes:
the environment award r t Expressed as:
wherein eta is t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions of the smart sensor.
With reference to figure 1. The process of the invention specifically comprises the following steps:
the method comprises the steps of establishing a sensor wireless local area network consisting of a gateway, a common sensor and an intelligent sensor, wherein the sensor transmits data in the wireless local area network, the data are collided in the transmission process, and the probability of transmitting the data of each sensor in a time-varying environment is obtained through mathematical analysis based on a Markov chain state transition model. Data reception power at the sensor is determined as P 0 An energy consumption model is established under the condition of (1), and a throughput optimization model is established according to the throughput of the sensor. And under the condition that the data transmission probability of the sensor is determined, optimizing the sensor wireless local area network according to the energy consumption model and the throughput optimization model to obtain a sustainable energy network, and calculating and updating the sustainable energy network to obtain a competition window of the intelligent sensor.
The embodiment of the invention also provides an industrial internet of things energy sustainability decision system based on deep reinforcement learning, which comprises the following steps:
the system comprises a first unit, a second unit and a third unit, wherein the first unit is used for establishing a sensor wireless local area network, the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
the second unit is used for establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
the third unit is used for establishing an energy consumption model according to the data receiving power of the sensing module;
a fourth unit, configured to establish a throughput optimization model according to throughput of the sensing module, where the throughput is used to characterize a data volume of a data packet sent by the sensing module within a certain time;
a fifth unit, configured to optimize the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model, so as to obtain an energy sustainable network;
and the sixth unit is used for obtaining a competition window of the intelligent sensor through the energy sustainable network output.
Corresponding to the method of fig. 1, an embodiment of the present invention further provides an electronic device, including a processor and a memory; the memory is used for storing programs; the processor executes the program to implement the method as described above.
Corresponding to the method of fig. 1, the embodiment of the present invention also provides a computer-readable storage medium, which stores a program, and the program is executed by a processor to implement the method as described above.
The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.
In summary, the embodiments of the present invention have the following advantages:
(1) According to the embodiment of the invention, the data transmission probability of each sensor is analyzed through the Markov chain-based state transition model, so that the accuracy of the system can be improved.
(2) According to the embodiment of the invention, the sensor wireless local area network is optimized through the throughput optimization model, so that the throughput of the system can be improved.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following technologies, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (7)
1. Deep reinforcement learning-based industrial internet of things energy sustainability decision method is characterized by comprising the following steps:
establishing a sensor wireless local area network, wherein the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
establishing an energy consumption model according to the data receiving power of the sensing module;
establishing a throughput optimization model according to the throughput of the sensing module, wherein the throughput is used for representing the data volume of a data packet sent by the sensing module within a certain time;
optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network;
obtaining a competition window of the intelligent sensor through the energy sustainable network output;
wherein, the optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network comprises:
building an optimized neural network for the intelligent sensor;
calculating to obtain the residual energy of the intelligent sensor according to the energy consumption model;
inputting the data transmission probability and the residual energy into the optimized neural network to obtain a sustainable energy network;
the obtaining of the contention window of the smart sensor through the energy sustainable network output includes:
initializing the energy sustainable network, and determining an initialization system;
randomly generating an initial competition window for the intelligent sensor in the initialization system;
updating the environment reward of the initial competition window and outputting a target competition window;
the updating of the environment reward to the initial competition window and the outputting of the target competition window comprise:
the environment award r t Expressed as:
wherein eta is t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions of the smart sensor.
2. The deep reinforcement learning-based industrial internet of things energy sustainability decision method as claimed in claim 1, wherein the establishing a state transition model based on a markov chain according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module comprises:
determining the data transmission collision probability of the sensing module when the sensing module transmits data in the sensor wireless local area network;
simulating a data transmission collision process of the sensing module by combining the data transmission collision probability and the discrete time Markov chain, and determining a state transition model based on the Markov chain;
and carrying out normalization condition processing on the state transition model of the Markov chain to obtain the data transmission probability of the sensing module.
3. The deep reinforcement learning-based industrial internet of things energy sustainability decision method according to claim 1, wherein the building of the energy consumption model according to the data receiving power of the sensing module comprises:
calculating to obtain a signal-to-noise ratio threshold value of the sensing module according to the receiving power of the sensing module;
calculating energy consumption required by the sensing module for successful transmission according to the signal-to-noise ratio threshold;
and establishing the energy consumption model according to the energy consumption.
4. The deep reinforcement learning-based industrial internet of things energy sustainability decision method according to claim 1, wherein the establishing of a throughput optimization model according to the throughput of the sensing module comprises:
the optimization model is as follows:
s.t.C1:n dead ≤0,
wherein d is distance, t is time, and alpha is t Expressed as a discount factor, eta t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions, CE, of the smart sensor min Expressed as minimum supplemental energy, CE, of the sensing module j Expressed as the supplemental energy, CE, of the jth sensor in the sensing module max Maximum supplementary energy for the sensing module, d min Indicating the minimum distance between the sensing module and the gateway, d j Represents the distance between the jth sensor in the sensing module and the gateway, d max The maximum distance between the sensing module and the gateway is represented, j represents a variable, and n represents the number of sensors in the sensing module.
5. An industrial internet of things energy sustainability decision system based on deep reinforcement learning, comprising:
the system comprises a first unit and a second unit, wherein the first unit is used for establishing a sensor wireless local area network, the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
the second unit is used for establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
the third unit is used for establishing an energy consumption model according to the data receiving power of the sensing module;
a fourth unit, configured to establish a throughput optimization model according to throughput of the sensing module, where the throughput is used to characterize a data volume of a data packet sent by the sensing module within a certain time;
a fifth unit, configured to optimize the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model, so as to obtain a sustainable energy network;
a sixth unit, configured to obtain a contention window of the smart sensor through the energy sustainable network output;
the fifth unit is configured to optimize the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model to obtain an energy sustainable network, and includes:
building an optimized neural network for the intelligent sensor;
calculating to obtain the residual energy of the intelligent sensor according to the energy consumption model;
inputting the data transmission probability and the residual energy into the optimized neural network to obtain an energy sustainable network;
the sixth unit is configured to obtain a contention window of the smart sensor through the energy sustainable network output, and includes:
initializing the energy sustainable network, and determining an initialization system;
randomly generating an initial competition window for the intelligent sensor in the initialization system;
updating the environment reward of the initial competition window and outputting a target competition window;
the updating of the environment reward to the initial competition window and the outputting of the target competition window comprise:
the environment award r t Expressed as:
wherein eta t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions of the smart sensor.
6. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program realizes the method of any one of claims 1-4.
7. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110920967.1A CN113747384B (en) | 2021-08-11 | 2021-08-11 | Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110920967.1A CN113747384B (en) | 2021-08-11 | 2021-08-11 | Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113747384A CN113747384A (en) | 2021-12-03 |
CN113747384B true CN113747384B (en) | 2023-04-07 |
Family
ID=78730740
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110920967.1A Active CN113747384B (en) | 2021-08-11 | 2021-08-11 | Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113747384B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111278161A (en) * | 2020-01-19 | 2020-06-12 | 电子科技大学 | WLAN protocol design and optimization method based on energy collection and deep reinforcement learning |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100943175B1 (en) * | 2007-11-30 | 2010-02-19 | 한국전자통신연구원 | A wireless sensor network structure and the control method thereof using dynamic message routing algorithm |
CN105792253B (en) * | 2016-02-25 | 2019-03-29 | 安徽农业大学 | A kind of wireless sense network medium access control optimization method |
CN110972162B (en) * | 2019-11-22 | 2022-03-25 | 南京航空航天大学 | Underwater acoustic sensor network saturation throughput solving method based on Markov chain |
-
2021
- 2021-08-11 CN CN202110920967.1A patent/CN113747384B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111278161A (en) * | 2020-01-19 | 2020-06-12 | 电子科技大学 | WLAN protocol design and optimization method based on energy collection and deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN113747384A (en) | 2021-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112668128B (en) | Method and device for selecting terminal equipment nodes in federal learning system | |
CN107665230B (en) | Training method and device of user behavior prediction model for intelligent home control | |
US20210081763A1 (en) | Electronic device and method for controlling the electronic device thereof | |
CN102592171A (en) | Method and device for predicting cognitive network performance based on BP (Back Propagation) neural network | |
CN110519816B (en) | Wireless roaming control method, device, storage medium and terminal equipment | |
Wu et al. | Learn to sense: A meta-learning-based sensing and fusion framework for wireless sensor networks | |
CN110926782A (en) | Circuit breaker fault type judgment method and device, electronic equipment and storage medium | |
CN114727316B (en) | Internet of things transmission method and device based on depth certainty strategy | |
Kim et al. | Performance analysis of the energy adaptive MAC protocol for wireless sensor networks with RF energy transfer | |
CN114519433A (en) | Multi-agent reinforcement learning and strategy execution method and computer equipment | |
CN114238658A (en) | Link prediction method and device of time sequence knowledge graph and electronic equipment | |
CN113747384B (en) | Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism | |
CN112383485A (en) | Network congestion control method and device | |
CN113347125B (en) | Bayesian neural network channel estimation method and device for MIMO-OFDM communication system | |
WO2024067115A1 (en) | Training method for gflownet, and related apparatus | |
CN117193008A (en) | Small sample robust imitation learning training method oriented to high-dimensional disturbance environment, electronic equipment and storage medium | |
CN114170560B (en) | Multi-device edge video analysis system based on deep reinforcement learning | |
CN116796821A (en) | Efficient neural network architecture searching method and device for 3D target detection algorithm | |
CN114500383B (en) | Intelligent congestion control method, system and medium for space-earth integrated information network | |
CN115980586A (en) | Power battery health state prediction method and device and computer equipment | |
Wang et al. | Adaptive trajectory-constrained exploration strategy for deep reinforcement learning | |
CN113891287B (en) | V2I access method and system for ensuring vehicle information age fairness in Internet of vehicles | |
CN113673665B (en) | Method, system, device and medium for optimizing wireless energy supply system of capsule robot | |
WO2024159986A1 (en) | Method and apparatus for generating dynamic threshold parameter of wireless local area network | |
CN117749625B (en) | Network performance optimization system and method based on deep Q network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |