CN113747384B - Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism - Google Patents

Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism Download PDF

Info

Publication number
CN113747384B
CN113747384B CN202110920967.1A CN202110920967A CN113747384B CN 113747384 B CN113747384 B CN 113747384B CN 202110920967 A CN202110920967 A CN 202110920967A CN 113747384 B CN113747384 B CN 113747384B
Authority
CN
China
Prior art keywords
sensing module
sensor
energy
data transmission
throughput
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110920967.1A
Other languages
Chinese (zh)
Other versions
CN113747384A (en
Inventor
韩瑜
李锦铭
秦臻
古博
姜善成
唐兆家
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202110920967.1A priority Critical patent/CN113747384B/en
Publication of CN113747384A publication Critical patent/CN113747384A/en
Application granted granted Critical
Publication of CN113747384B publication Critical patent/CN113747384B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses an industrial Internet of things energy sustainability decision mechanism based on deep reinforcement learning, which comprises the following steps: establishing a sensor wireless local area network; establishing a state transition model based on a Markov chain according to the data transmission collision probability of the sensor to obtain the data transmission probability of the sensor; establishing an energy consumption model according to the data receiving power of the sensor; establishing a throughput optimization model according to the throughput of the sensor; optimizing the sensor wireless local area network according to the data transmission probability, the energy consumption model and the throughput optimization model of the sensor to obtain an energy sustainable network; and obtaining a competition window of the intelligent sensor through energy sustainable network output. The embodiment of the invention improves the throughput of the system through the throughput optimization model, and can be widely applied to the technical field of the Internet of things.

Description

Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism
Technical Field
The invention relates to the technical field of Internet of things, in particular to an industrial Internet of things energy sustainability decision mechanism based on deep reinforcement learning.
Background
A large number of sensors are deployed in the industrial Internet of things, and form a wireless sensor network through an IEEE 802.11ax protocol so as to monitor data of various intelligent devices in real time. Most of these sensors are powered by batteries, are deployed in inaccessible locations, and some sensors have some mobility, so it is impractical to replace batteries for these sensors. Such sensors may draw power from the charging dock or the external environment by way of wireless charging or power Harvesting (Energy Harvesting). Because a single sensor has limited energy, the energy consumption of the sensor can be optimized by controlling the frequency of data transmission, and the transmission throughput in the local area network is maximized.
Considering the problem of collision which may occur during transmission of users, the IEEE 802.11ax protocol often adopts a conventional binary back-off algorithm, and users may randomly wait for a period of time before retransmitting when collision occurs. The random time is selected according to a Contention Window (CW) value, and a large CW can avoid collision but delay the sending time of data and reduce the throughput; a small CW allows the user to retransmit the data quickly, but increases the probability of collision.
In summary, how to adjust the size of the sensor collision window to maximize the transmission throughput in the local area network is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In view of this, the embodiment of the present invention provides an industrial internet of things energy sustainability decision mechanism based on deep reinforcement learning, so as to improve the transmission throughput of the system.
In one aspect, the invention provides an industrial internet of things energy sustainability decision mechanism based on deep reinforcement learning, which comprises:
establishing a sensor wireless local area network, wherein the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
establishing an energy consumption model according to the data receiving power of the sensing module;
establishing a throughput optimization model according to the throughput of the sensing module, wherein the throughput is used for representing the data volume of a data packet sent by the sensing module within a certain time;
optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network;
and obtaining a competition window of the intelligent sensor through the energy sustainable network output.
Optionally, the establishing a state transition model based on a markov chain according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module includes:
determining the data transmission collision probability of the sensing module when the sensing module transmits data in the sensor wireless local area network;
simulating a data transmission collision process of the sensing module by combining the data transmission collision probability and the discrete time Markov chain, and determining a state transition model based on the Markov chain;
and carrying out normalization condition processing on the state transition model of the Markov chain to obtain the data transmission probability of the sensing module.
Optionally, the establishing an energy consumption model according to the data receiving power of the sensing module includes:
calculating to obtain a signal-to-noise ratio threshold value of the sensing module according to the receiving power of the sensing module;
calculating the energy consumption required by the sensing module for successful transmission according to the signal-to-noise ratio threshold;
and establishing the energy consumption model according to the energy consumption.
Optionally, the optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model to obtain an energy sustainability system includes:
building an optimized neural network for the intelligent sensor;
calculating to obtain the residual energy of the intelligent sensor according to the energy consumption model;
and inputting the data transmission probability and the residual energy into the optimized neural network to obtain an energy sustainable network.
Optionally, the obtaining a contention window of the smart sensor through the energy sustainable network output includes:
initializing the energy sustainable network, and determining an initialization system;
randomly generating an initial competition window for the intelligent sensor in the initialization system;
and updating the environment reward of the initial competition window and outputting a target competition window.
Optionally, the establishing a throughput optimization model according to the throughput of the sensing module includes:
the optimization model is as follows:
Figure BDA0003207391850000021
s.t. C1:n dead ≤0,
Figure BDA0003207391850000022
Figure BDA0003207391850000023
wherein d is distance, t is time, alpha t Expressed as a discount factor, eta t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions, CE, of the smart sensor min Expressed as minimum supplemental energy, CE, of the sensing module j Expressed as the supplemental energy, CE, of the jth sensor in the sensing module max Maximum supplementary energy for the sensing module, d min Represents the minimum distance between the sensing module and the gateway, d j Represents the distance between the jth sensor in the sensing module and the gateway, d max The maximum distance between the sensing module and the gateway is represented, j represents a variable, and n represents the number of sensors in the sensing module.
Optionally, the updating the environmental reward to the initial contention window and outputting the target contention window include:
the environment award r t Expressed as:
Figure BDA0003207391850000031
wherein eta t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions of the smart sensor.
On the other hand, the embodiment of the invention also discloses an industrial internet of things energy sustainability decision-making system based on deep reinforcement learning, which comprises the following steps:
the system comprises a first unit, a second unit and a third unit, wherein the first unit is used for establishing a sensor wireless local area network, the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
the second unit is used for establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
the third unit is used for establishing an energy consumption model according to the data receiving power of the sensing module;
a fourth unit, configured to establish a throughput optimization model according to throughput of the sensing module, where the throughput is used to characterize a data volume of a data packet sent by the sensing module within a certain time;
a fifth unit, configured to optimize the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model, so as to obtain an energy sustainable network;
and the sixth unit is used for obtaining a competition window of the intelligent sensor through the energy sustainable network output.
On the other hand, the embodiment of the invention also discloses an electronic device, which comprises a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
In another aspect, an embodiment of the present invention further discloses a computer-readable storage medium, where the storage medium stores a program, and the program is executed by a processor to implement the method described above.
In another aspect, an embodiment of the present invention further discloses a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.
Compared with the prior art, the technical scheme adopted by the invention has the following technical effects: according to the invention, a Markov chain-based state transition model is established according to the data transmission collision probability of the sensing module, so that the data transmission probability of the sensing module is obtained; establishing an energy consumption model according to the data receiving power of the sensing module; establishing a throughput optimization model according to the throughput of the sensing module; optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network; the throughput of the smart sensor can be improved without power interruption.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a detailed flow chart of an embodiment of the present invention;
fig. 2 is a sensor wireless local area network topology diagram according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The embodiment of the invention discloses an industrial Internet of things energy sustainability decision mechanism based on deep reinforcement learning, which comprises the following steps:
s1, establishing a sensor wireless local area network, wherein the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
s2, establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
s3, establishing an energy consumption model according to the data receiving power of the sensing module;
s4, establishing a throughput optimization model according to the throughput of the sensing module, wherein the throughput is used for representing the data volume of a data packet sent by the sensing module within a certain time;
s5, optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network;
and S6, obtaining a competition window of the intelligent sensor through the energy sustainable network output.
Referring to fig. 2, a wireless local area network of sensors is established by a gateway 1 and a plurality of wireless sensors connected with the gateway under IEEE 802.11ax protocol, the wireless sensors include an intelligent sensor 3 and a plurality of ordinary sensors 2, and the signal transmission of the wireless sensors obtains electric energy from the surrounding environment to supplement energy. All sensors can only communicate over the network via the gateway 1, they cannot communicate directly with each other, and only process one packet at a time. There is a time-varying wireless channel between the gateway 1 and all sensors, the channel coefficient is expressed as h = { h i I belongs to n, wherein the channel coefficient between the gateway 1 and the intelligent sensor 3 in the tth period is represented as h n,t The channel is constant during each period of length T. Further, b (t) (0. Ltoreq. B (t). Ltoreq.m) is used to represent the number of retreats at time t, where m is the mostThe large backoff number is decreased by one; s (t) is used to represent the random course of the sensor in the back-off phase (0, 1, \8230;, m) at a certain time t. The normal sensor 2 updates the size of the competition window in a random mode, and the intelligent sensor 3 dynamically selects the optimal competition window through deep reinforcement learning and environment interaction.
Further as a preferred embodiment, in the step S2, establishing a state transition model based on a markov chain according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module includes:
determining the data transmission collision probability of the sensing module when the sensing module transmits data in the sensor wireless local area network:
simulating a data transmission collision process of the sensing module by combining the data transmission collision probability and the discrete time Markov chain, and determining a state transition model based on the Markov chain;
and carrying out normalization condition processing on the state transition model of the Markov chain to obtain the data transmission probability of the sensing module.
When each stage of sensor wireless network data transmission is started, a sensor randomly selects P as data transmission collision probability in a certain range, and the P is used for representing the collision probability of a data packet transmitted on a channel. In each data transmission attempt, packets sent from the sensor collide with a constant and independent probability P, and a two-dimensional process { s (t), b (t) } is simulated using a discrete-time markov chain in which a non-zero one-step transition probability P can be expressed as follows:
Figure BDA0003207391850000051
in the formula, i is a variable s (t), k is a variable b (t), m is the maximum number of retreatments minus one, and W 0 Initializing a contention window for the smart sensor, W i Is the competition window of the ith ordinary sensor, W m Compete for the window for the smart sensor.
When the system tends to be stable, use b i,k =lim t→∞ P{s(t)=i,b(t)=k}k∈[0,W i -1],i∈[0,m]Representing a smooth distribution of markov chains. The markov chain closure is demonstrated as follows:
b i-1,0 ·p=b i,0 →b i,0 =p i b 0,0 0<i<m;
Figure BDA0003207391850000052
in the formula, b i-1,0 Denotes s (t) = i-1, b (t) =0}, p denotes a conditional collision probability, b denotes a conditional collision probability, and i,0 denotes { s (t) = i, b (t) =0}, b m-1,0 Represents { s (t) = m-1, b (t) =0}, b m,0 Denotes { s (t) = m, b (t) =0}, b 0,0 Denotes s (t) =0, b (t) =0, i denotes a variable, and m denotes the maximum number of steps down by one.
Further, it is possible to obtain:
Figure BDA0003207391850000061
simplifying to obtain:
Figure BDA0003207391850000062
the markov chain is subjected to a normalization condition, which is simplified as follows:
Figure BDA0003207391850000063
namely:
Figure BDA0003207391850000064
let τ denote the probability of transmission of a sensor during a randomly selected time period, any transmission occurring when the back-off time counter equals zero, the data transmission probability τ of the sensor being:
Figure BDA0003207391850000065
further preferably, in step S3, the establishing an energy consumption model according to the data received power of the sensing module includes:
calculating to obtain a signal-to-noise ratio threshold value of the sensing module according to the receiving power of the sensing module;
calculating energy consumption required by the sensing module for successful transmission according to the signal-to-noise ratio threshold;
and establishing the energy consumption model according to the energy consumption.
Wherein the sensor stores the collected energy in a super capacitor for data transmission, and the energy capacity of the super capacitor is represented as C max . Considering a general energy collection model, we assume that in the t period, the sensor supplements energy to the super capacitor through energy recovery to be CE t Joules/millisecond. The energy collected by the sensor at each stage is time-varying due to the mobility of the sensor and the changes in the transmission environment. In addition, we assume that the receiving end of all devices has a mean value of zero and a variance σ 2 Equal additive white gaussian noise. To ensure that the signal sent by the sensor can be successfully captured by the gateway, the signal-to-noise ratio (SNR) received at the gateway should be greater than the capture threshold. Therefore, the minimum received power is P under the condition that the probability of sensor data transmission is τ 0 The lowest threshold ζ of the received signal-to-noise ratio at the gateway is:
Figure BDA0003207391850000071
in the formula, h is the channel coefficient, P 0 Is minimum received power, σ 2 Is gaussian white noise.
Therefore, successful transmission by the sensor during the t period is consumedMinimum energy E of 0 Comprises the following steps:
Figure BDA0003207391850000072
where, ζ is the lowest threshold of signal-to-noise ratio, σ 2 Is Gaussian white noise, Δ T is expressed as the time required for packet transmission, h t Is the channel coefficient over the t period.
In order to realize sustainable utilization of energy, the energy consumption of all the sensors is E when the sensors transmit in the uplink 0 . Suppose that during the t period, the sensor transmits z in total t Each data packet, after the period of time t is over, the energy consumption model
Figure BDA0003207391850000073
Comprises the following steps:
Figure BDA0003207391850000074
in the formula, E t The electric quantity of the super capacitor before the beginning of the t period, z t Number of data packets to be transmitted in time period t, E 0 For minimum energy consumption, CE t For energy recovery to replenish the capacitor during a period T, T is the time interval of the period T.
Further as a preferred embodiment, in the step S5, optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model to obtain an energy sustainability system, includes:
building an optimized neural network for the intelligent sensor;
calculating to obtain the residual energy of the intelligent sensor according to the energy consumption model;
and inputting the data transmission probability and the residual energy into the optimized neural network to obtain an energy sustainable network.
The method comprises the following steps that four neural networks are built for an intelligent sensor, wherein the four neural networks are respectively execution strategy networks used for selecting competition window values; an execution evaluation network for evaluating the contention window value; a target policy network for stable training and providing a contention window value for performing an update of the evaluation network and a target evaluation network for providing a next value for performing an update of the evaluation network. Under the condition that the data transmission probability is determined, the residual energy of the intelligent sensor is obtained through calculation of an energy consumption model, and the channel state information, the distance moving condition, the energy recovery and the residual energy are input into an execution strategy network as observation to obtain an optimized wireless local area network, namely an energy sustainable network.
Further preferably, in step S6, obtaining the contention window of the smart sensor through the energy sustainable network output includes:
initializing the energy sustainable network, and determining an initialization system;
randomly generating an initial competition window for the intelligent sensor in the initialization system;
and updating the environment reward of the initial competition window and outputting a target competition window.
Wherein, the experience in the energy sustainable network is put back to the pool, the competition window of the common sensor and the systematized environment are initialized. The method comprises the steps that the size of an initial competition window is independently and randomly set in a certain range by a common sensor, and a competition window value of an intelligent sensor in an initialization system is randomly generated. According to the energy consumption model, the energy consumption E can be obtained when a single data packet is transmitted in the uplink of all the sensors 0 Under the condition (2), the energy can be continuously utilized. At the moment, all sensors transmit data packets according to an SEH-CSMA/CA protocol, and after a certain period of time, the number of the data packets successfully transmitted by the intelligent sensors and the energy interruption condition are counted, and the throughput of the intelligent sensors at the stage is calculated. The currently obtained reward r is calculated according to a reward formula. Meanwhile, the intelligent sensor calculates a loss function of the execution evaluation network by using the mean square error, and then updates the execution evaluation network in a reverse gradient transfer mode; at the same time, willThe states of all the sensors and the competition window values of the intelligent sensors at the current moment are input into the execution evaluation network to obtain state-action values, and the execution strategy network is updated by using the values in a reverse gradient transmission mode. In addition, the target strategy network and the target evaluation network are gradually updated in a mode of copying a certain proportion at each step, and the intelligent sensor can make adaptive adjustment according to the change of the dynamic condition by repeatedly iterating the processes to obtain the size of a competition window of the intelligent sensor.
Further as a preferred embodiment, establishing a throughput optimization model according to the throughput of the sensing module includes:
the optimization model is as follows:
Figure BDA0003207391850000081
s.t. C1:n dead ≤0,
Figure BDA0003207391850000082
Figure BDA0003207391850000083
wherein d is distance, t is time, alpha t Expressed as a discount factor, eta t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions, CE, of the smart sensor min Expressed as minimum supplemental energy, CE, of the sensing module j Expressed as the supplemental energy, CE, of the jth sensor in the sensing module max Maximum supplementary energy for the sensing module, d min Indicating the minimum distance between the sensing module and the gateway, d j Represents the distance between the jth sensor in the sensing module and the gateway, d max The maximum distance between the sensing module and the gateway is represented, j represents a variable, and n represents the number of sensors in the sensing module.
Further as a preferred embodiment, updating the environment reward for the initial contention window, and outputting a target contention window, includes:
the environment award r t Expressed as:
Figure BDA0003207391850000084
wherein eta is t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions of the smart sensor.
With reference to figure 1. The process of the invention specifically comprises the following steps:
the method comprises the steps of establishing a sensor wireless local area network consisting of a gateway, a common sensor and an intelligent sensor, wherein the sensor transmits data in the wireless local area network, the data are collided in the transmission process, and the probability of transmitting the data of each sensor in a time-varying environment is obtained through mathematical analysis based on a Markov chain state transition model. Data reception power at the sensor is determined as P 0 An energy consumption model is established under the condition of (1), and a throughput optimization model is established according to the throughput of the sensor. And under the condition that the data transmission probability of the sensor is determined, optimizing the sensor wireless local area network according to the energy consumption model and the throughput optimization model to obtain a sustainable energy network, and calculating and updating the sustainable energy network to obtain a competition window of the intelligent sensor.
The embodiment of the invention also provides an industrial internet of things energy sustainability decision system based on deep reinforcement learning, which comprises the following steps:
the system comprises a first unit, a second unit and a third unit, wherein the first unit is used for establishing a sensor wireless local area network, the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
the second unit is used for establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
the third unit is used for establishing an energy consumption model according to the data receiving power of the sensing module;
a fourth unit, configured to establish a throughput optimization model according to throughput of the sensing module, where the throughput is used to characterize a data volume of a data packet sent by the sensing module within a certain time;
a fifth unit, configured to optimize the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model, so as to obtain an energy sustainable network;
and the sixth unit is used for obtaining a competition window of the intelligent sensor through the energy sustainable network output.
Corresponding to the method of fig. 1, an embodiment of the present invention further provides an electronic device, including a processor and a memory; the memory is used for storing programs; the processor executes the program to implement the method as described above.
Corresponding to the method of fig. 1, the embodiment of the present invention also provides a computer-readable storage medium, which stores a program, and the program is executed by a processor to implement the method as described above.
The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.
In summary, the embodiments of the present invention have the following advantages:
(1) According to the embodiment of the invention, the data transmission probability of each sensor is analyzed through the Markov chain-based state transition model, so that the accuracy of the system can be improved.
(2) According to the embodiment of the invention, the sensor wireless local area network is optimized through the throughput optimization model, so that the throughput of the system can be improved.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following technologies, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. Deep reinforcement learning-based industrial internet of things energy sustainability decision method is characterized by comprising the following steps:
establishing a sensor wireless local area network, wherein the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
establishing an energy consumption model according to the data receiving power of the sensing module;
establishing a throughput optimization model according to the throughput of the sensing module, wherein the throughput is used for representing the data volume of a data packet sent by the sensing module within a certain time;
optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network;
obtaining a competition window of the intelligent sensor through the energy sustainable network output;
wherein, the optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network comprises:
building an optimized neural network for the intelligent sensor;
calculating to obtain the residual energy of the intelligent sensor according to the energy consumption model;
inputting the data transmission probability and the residual energy into the optimized neural network to obtain a sustainable energy network;
the obtaining of the contention window of the smart sensor through the energy sustainable network output includes:
initializing the energy sustainable network, and determining an initialization system;
randomly generating an initial competition window for the intelligent sensor in the initialization system;
updating the environment reward of the initial competition window and outputting a target competition window;
the updating of the environment reward to the initial competition window and the outputting of the target competition window comprise:
the environment award r t Expressed as:
Figure FDA0004056867890000011
wherein eta is t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions of the smart sensor.
2. The deep reinforcement learning-based industrial internet of things energy sustainability decision method as claimed in claim 1, wherein the establishing a state transition model based on a markov chain according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module comprises:
determining the data transmission collision probability of the sensing module when the sensing module transmits data in the sensor wireless local area network;
simulating a data transmission collision process of the sensing module by combining the data transmission collision probability and the discrete time Markov chain, and determining a state transition model based on the Markov chain;
and carrying out normalization condition processing on the state transition model of the Markov chain to obtain the data transmission probability of the sensing module.
3. The deep reinforcement learning-based industrial internet of things energy sustainability decision method according to claim 1, wherein the building of the energy consumption model according to the data receiving power of the sensing module comprises:
calculating to obtain a signal-to-noise ratio threshold value of the sensing module according to the receiving power of the sensing module;
calculating energy consumption required by the sensing module for successful transmission according to the signal-to-noise ratio threshold;
and establishing the energy consumption model according to the energy consumption.
4. The deep reinforcement learning-based industrial internet of things energy sustainability decision method according to claim 1, wherein the establishing of a throughput optimization model according to the throughput of the sensing module comprises:
the optimization model is as follows:
Figure FDA0004056867890000021
s.t.C1:n dead ≤0,
Figure FDA0004056867890000022
Figure FDA0004056867890000023
wherein d is distance, t is time, and alpha is t Expressed as a discount factor, eta t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions, CE, of the smart sensor min Expressed as minimum supplemental energy, CE, of the sensing module j Expressed as the supplemental energy, CE, of the jth sensor in the sensing module max Maximum supplementary energy for the sensing module, d min Indicating the minimum distance between the sensing module and the gateway, d j Represents the distance between the jth sensor in the sensing module and the gateway, d max The maximum distance between the sensing module and the gateway is represented, j represents a variable, and n represents the number of sensors in the sensing module.
5. An industrial internet of things energy sustainability decision system based on deep reinforcement learning, comprising:
the system comprises a first unit and a second unit, wherein the first unit is used for establishing a sensor wireless local area network, the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
the second unit is used for establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
the third unit is used for establishing an energy consumption model according to the data receiving power of the sensing module;
a fourth unit, configured to establish a throughput optimization model according to throughput of the sensing module, where the throughput is used to characterize a data volume of a data packet sent by the sensing module within a certain time;
a fifth unit, configured to optimize the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model, so as to obtain a sustainable energy network;
a sixth unit, configured to obtain a contention window of the smart sensor through the energy sustainable network output;
the fifth unit is configured to optimize the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model to obtain an energy sustainable network, and includes:
building an optimized neural network for the intelligent sensor;
calculating to obtain the residual energy of the intelligent sensor according to the energy consumption model;
inputting the data transmission probability and the residual energy into the optimized neural network to obtain an energy sustainable network;
the sixth unit is configured to obtain a contention window of the smart sensor through the energy sustainable network output, and includes:
initializing the energy sustainable network, and determining an initialization system;
randomly generating an initial competition window for the intelligent sensor in the initialization system;
updating the environment reward of the initial competition window and outputting a target competition window;
the updating of the environment reward to the initial competition window and the outputting of the target competition window comprise:
the environment award r t Expressed as:
Figure FDA0004056867890000031
wherein eta t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions of the smart sensor.
6. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program realizes the method of any one of claims 1-4.
7. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1-4.
CN202110920967.1A 2021-08-11 2021-08-11 Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism Active CN113747384B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110920967.1A CN113747384B (en) 2021-08-11 2021-08-11 Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110920967.1A CN113747384B (en) 2021-08-11 2021-08-11 Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism

Publications (2)

Publication Number Publication Date
CN113747384A CN113747384A (en) 2021-12-03
CN113747384B true CN113747384B (en) 2023-04-07

Family

ID=78730740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110920967.1A Active CN113747384B (en) 2021-08-11 2021-08-11 Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism

Country Status (1)

Country Link
CN (1) CN113747384B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111278161A (en) * 2020-01-19 2020-06-12 电子科技大学 WLAN protocol design and optimization method based on energy collection and deep reinforcement learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100943175B1 (en) * 2007-11-30 2010-02-19 한국전자통신연구원 A wireless sensor network structure and the control method thereof using dynamic message routing algorithm
CN105792253B (en) * 2016-02-25 2019-03-29 安徽农业大学 A kind of wireless sense network medium access control optimization method
CN110972162B (en) * 2019-11-22 2022-03-25 南京航空航天大学 Underwater acoustic sensor network saturation throughput solving method based on Markov chain

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111278161A (en) * 2020-01-19 2020-06-12 电子科技大学 WLAN protocol design and optimization method based on energy collection and deep reinforcement learning

Also Published As

Publication number Publication date
CN113747384A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
CN112668128B (en) Method and device for selecting terminal equipment nodes in federal learning system
CN107665230B (en) Training method and device of user behavior prediction model for intelligent home control
US20210081763A1 (en) Electronic device and method for controlling the electronic device thereof
CN102592171A (en) Method and device for predicting cognitive network performance based on BP (Back Propagation) neural network
CN110519816B (en) Wireless roaming control method, device, storage medium and terminal equipment
Wu et al. Learn to sense: A meta-learning-based sensing and fusion framework for wireless sensor networks
CN110926782A (en) Circuit breaker fault type judgment method and device, electronic equipment and storage medium
CN114727316B (en) Internet of things transmission method and device based on depth certainty strategy
Kim et al. Performance analysis of the energy adaptive MAC protocol for wireless sensor networks with RF energy transfer
CN114519433A (en) Multi-agent reinforcement learning and strategy execution method and computer equipment
CN114238658A (en) Link prediction method and device of time sequence knowledge graph and electronic equipment
CN113747384B (en) Deep reinforcement learning-based industrial internet of things energy sustainability decision mechanism
CN112383485A (en) Network congestion control method and device
CN113347125B (en) Bayesian neural network channel estimation method and device for MIMO-OFDM communication system
WO2024067115A1 (en) Training method for gflownet, and related apparatus
CN117193008A (en) Small sample robust imitation learning training method oriented to high-dimensional disturbance environment, electronic equipment and storage medium
CN114170560B (en) Multi-device edge video analysis system based on deep reinforcement learning
CN116796821A (en) Efficient neural network architecture searching method and device for 3D target detection algorithm
CN114500383B (en) Intelligent congestion control method, system and medium for space-earth integrated information network
CN115980586A (en) Power battery health state prediction method and device and computer equipment
Wang et al. Adaptive trajectory-constrained exploration strategy for deep reinforcement learning
CN113891287B (en) V2I access method and system for ensuring vehicle information age fairness in Internet of vehicles
CN113673665B (en) Method, system, device and medium for optimizing wireless energy supply system of capsule robot
WO2024159986A1 (en) Method and apparatus for generating dynamic threshold parameter of wireless local area network
CN117749625B (en) Network performance optimization system and method based on deep Q network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant