CN113747384B - Energy Sustainability Decision-Making Mechanism for Industrial Internet of Things Based on Deep Reinforcement Learning - Google Patents

Energy Sustainability Decision-Making Mechanism for Industrial Internet of Things Based on Deep Reinforcement Learning Download PDF

Info

Publication number
CN113747384B
CN113747384B CN202110920967.1A CN202110920967A CN113747384B CN 113747384 B CN113747384 B CN 113747384B CN 202110920967 A CN202110920967 A CN 202110920967A CN 113747384 B CN113747384 B CN 113747384B
Authority
CN
China
Prior art keywords
sensor
energy
sensing module
data transmission
throughput
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110920967.1A
Other languages
Chinese (zh)
Other versions
CN113747384A (en
Inventor
韩瑜
李锦铭
秦臻
古博
姜善成
唐兆家
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202110920967.1A priority Critical patent/CN113747384B/en
Publication of CN113747384A publication Critical patent/CN113747384A/en
Application granted granted Critical
Publication of CN113747384B publication Critical patent/CN113747384B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本发明公开了基于深度强化学习的工业物联网能量可持续性决策机制,包括:建立传感器无线局域网络;根据传感器的数据传输碰撞概率建立基于马尔科夫链的状态转换模型,得到传感器的数据传输概率;根据传感器的数据接收功率建立能量消耗模型;根据传感器的吞吐量建立吞吐量优化模型;根据传感器的数据传输概率、能量消耗模型和吞吐量优化模型对传感器无线局域网络进行优化,得到能量可持续网络;通过能量可持续网络输出得到智能传感器的竞争窗口。本发明实施例通过吞吐量优化模型提高系统的吞吐量,能够广泛应用于物联网技术领域。

Figure 202110920967

The invention discloses a decision-making mechanism for energy sustainability of the industrial Internet of Things based on deep reinforcement learning, including: establishing a sensor wireless local area network; establishing a state transition model based on a Markov chain according to the data transmission collision probability of the sensor, and obtaining the data transmission of the sensor Probability; establish an energy consumption model based on the sensor's data receiving power; establish a throughput optimization model based on the sensor's throughput; optimize the sensor wireless local area network according to the sensor's data transmission probability, energy consumption model and throughput optimization model, and get the energy available Persistent Networks; Competitive windows for smart sensors through energy sustainable network output. The embodiment of the present invention improves the throughput of the system through the throughput optimization model, and can be widely used in the technical field of the Internet of Things.

Figure 202110920967

Description

基于深度强化学习的工业物联网能量可持续性决策机制Energy sustainability decision-making mechanism for industrial Internet of Things based on deep reinforcement learning

技术领域Technical Field

本发明涉及物联网技术领域,尤其是基于深度强化学习的工业物联网能量可持续性决策机制。The present invention relates to the technical field of Internet of Things, and in particular to an energy sustainability decision-making mechanism for industrial Internet of Things based on deep reinforcement learning.

背景技术Background Art

在工业物联网中部署有大量的传感器,它们通过IEEE 802.11ax协议构成了无线传感器网络,以实时监测各类智能装置的数据。这些传感器大多由电池供电,部署在不易到达的位置,且部分传感器会有一定的移动性,因此为这些传感器更换电池是不现实的。这类传感器可以通过无线充电或能量回收(Energy Harvesting)的方式从充电底座或者外界环境获取能量。由于单个传感器能量有限,我们可以通过控制发送数据的频率来优化传感器的能量消耗,实现局域网中传输吞吐量的最大化。A large number of sensors are deployed in the Industrial Internet of Things. They form a wireless sensor network through the IEEE 802.11ax protocol to monitor the data of various smart devices in real time. Most of these sensors are powered by batteries and deployed in difficult-to-reach locations. Some sensors have a certain degree of mobility, so it is unrealistic to replace the batteries for these sensors. Such sensors can obtain energy from the charging base or the external environment through wireless charging or energy harvesting. Since the energy of a single sensor is limited, we can optimize the energy consumption of the sensor by controlling the frequency of sending data to maximize the transmission throughput in the local area network.

考虑用户在传输过程中可能发生的冲突问题,IEEE 802.11ax协议往往采用传统的二进制退避算法,用户在发生冲突时会随机等待一段时间再重新发送。这个随机时间的选取与竞争窗口(CW)值有关,大的CW可以避免冲突,但是会延迟数据的发送时间,减小吞吐量;小的CW允许用户快速重发数据,但会增大冲突的概率。Considering the possible conflicts that may occur during the transmission process, the IEEE 802.11ax protocol often uses the traditional binary backoff algorithm. When a conflict occurs, the user will randomly wait for a period of time before resending. The selection of this random time is related to the contention window (CW) value. A large CW can avoid conflicts, but it will delay the data transmission time and reduce the throughput; a small CW allows users to quickly resend data, but it will increase the probability of conflicts.

综上所述,如何调整传感器冲突窗口大小以实现局域网中传输吞吐量的最大化,是目前本领域的技术人员需要解决的技术问题。In summary, how to adjust the sensor conflict window size to maximize the transmission throughput in the local area network is a technical problem that technicians in this field need to solve.

发明内容Summary of the invention

有鉴于此,本发明实施例提供基于深度强化学习的工业物联网能量可持续性决策机制,以实现提高系统的传输吞吐量。In view of this, an embodiment of the present invention provides an industrial Internet of Things energy sustainability decision-making mechanism based on deep reinforcement learning to improve the transmission throughput of the system.

一方面,本发明提供了基于深度强化学习的工业物联网能量可持续性决策机制,包括:On the one hand, the present invention provides an industrial Internet of Things energy sustainability decision-making mechanism based on deep reinforcement learning, including:

建立传感器无线局域网络,其中,所述传感器无线局域网络包括网关和传感模块,所述传感模块包括普通传感器和智能传感器;Establishing a sensor wireless local area network, wherein the sensor wireless local area network includes a gateway and a sensor module, and the sensor module includes a common sensor and an intelligent sensor;

根据所述传感模块的数据传输碰撞概率建立基于马尔科夫链的状态转换模型,得到所述传感模块的数据传输概率;Establishing a state transition model based on a Markov chain according to the data transmission collision probability of the sensor module to obtain the data transmission probability of the sensor module;

根据所述传感模块的数据接收功率建立能量消耗模型;Establishing an energy consumption model according to the data receiving power of the sensor module;

根据所述传感模块的吞吐量建立吞吐量优化模型,其中,所述吞吐量用于表征所述传感模块在一定时间内发送数据包的数据量大小;Establishing a throughput optimization model according to the throughput of the sensor module, wherein the throughput is used to characterize the amount of data packets sent by the sensor module within a certain period of time;

根据所述传感模块的数据传输概率、所述能量消耗模型和所述吞吐量优化模型对所述传感器无线局域网络进行优化,得到能量可持续网络;Optimizing the sensor wireless local area network according to the data transmission probability of the sensor module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network;

通过所述能量可持续网络输出得到所述智能传感器的竞争窗口。The contention window of the smart sensor is obtained through the energy sustainable network output.

可选地,所述根据所述传感模块的数据传输碰撞概率建立基于马尔科夫链的状态转换模型,得到所述传感模块的数据传输概率,包括:Optionally, establishing a state transition model based on a Markov chain according to the data transmission collision probability of the sensor module to obtain the data transmission probability of the sensor module includes:

确定所述传感模块在所述传感器无线局域网络中进行数据传输时发生的数据传输碰撞概率;Determine the probability of data transmission collision that occurs when the sensor module performs data transmission in the sensor wireless local area network;

结合所述数据传输碰撞概率和离散时间马尔科夫链模拟所述传感模块的数据传输碰撞过程,确定基于马尔科夫链的状态转换模型;Combining the data transmission collision probability and the discrete-time Markov chain to simulate the data transmission collision process of the sensor module, and determining a state transition model based on the Markov chain;

对所述马尔科夫链的状态转换模型进行归一化条件处理,得到所述传感模块的数据传输概率。The state transition model of the Markov chain is subjected to normalization condition processing to obtain the data transmission probability of the sensor module.

可选地,所述根据所述传感模块的数据接收功率建立能量消耗模型,包括:Optionally, establishing an energy consumption model according to the data receiving power of the sensor module includes:

根据所述传感模块的接收功率计算得到所述传感模块的信噪比阈值;Calculate the signal-to-noise ratio threshold of the sensor module according to the received power of the sensor module;

根据所述信噪比阈值计算得到所述传感模块进行成功传输需要的能量消耗;Calculating the energy consumption required for successful transmission of the sensor module according to the signal-to-noise ratio threshold;

根据所述能量消耗建立所述能量消耗模型。The energy consumption model is established according to the energy consumption.

可选地,所述根据所述传感模块的数据传输概率、所述能量消耗模型和所述吞吐量优化模型对所述传感器无线局域网络进行优化,得到能量可持续性系统,包括:Optionally, the optimizing the sensor wireless local area network according to the data transmission probability of the sensor module, the energy consumption model and the throughput optimization model to obtain an energy sustainability system includes:

为所述智能传感器搭建优化神经网络;Building an optimized neural network for the smart sensor;

根据所述能量消耗模型计算得到所述智能传感器的剩余能量;Calculating the remaining energy of the smart sensor according to the energy consumption model;

将所述数据传输概率和所述剩余能量输入到所述优化神经网络中,得到能量可持续网络。The data transmission probability and the remaining energy are input into the optimization neural network to obtain an energy sustainable network.

可选地,所述通过所述能量可持续网络输出得到所述智能传感器的竞争窗口,包括:Optionally, obtaining the contention window of the smart sensor through the energy sustainable network output includes:

对所述能量可持续网络进行初始化,确定初始化系统;Initializing the energy sustainable network and determining an initialization system;

对所述初始化系统中的智能传感器随机生成初始竞争窗口;Randomly generating an initial contention window for the smart sensors in the initialization system;

对所述初始竞争窗口进行环境奖励更新,输出目的竞争窗口。The initial contention window is updated with an environmental reward, and a target contention window is output.

可选地,所述根据所述传感模块的吞吐量建立吞吐量优化模型,包括:Optionally, establishing a throughput optimization model according to the throughput of the sensor module includes:

所述优化模型为:The optimization model is:

Figure BDA0003207391850000021
Figure BDA0003207391850000021

s.t. C1:ndead≤0,st C1: n dead ≤ 0,

Figure BDA0003207391850000022
Figure BDA0003207391850000022

Figure BDA0003207391850000023
Figure BDA0003207391850000023

其中,d表示为距离,t表示为时间,αt表示为折扣因子,ηt表示为所述智能传感器的吞吐量,ndead表示为所述智能传感器的能量中断次数,CEmin表示为传感模块的最小补充能量,CEj表示为传感模块中第j个传感器的补充能量,CEmax为传感模块的最大补充能量,dmin表示传感模块与网关之间的最小距离,dj表示传感模块中第j个传感器与网关之间的距离,dmax表示传感模块与网关之间的最大距离,j表示变量,n表示传感模块中传感器的个数。Among them, d represents distance, t represents time, α t represents discount factor, η t represents throughput of the smart sensor, n dead represents the number of energy interruptions of the smart sensor, CE min represents the minimum replenishment energy of the sensor module, CE j represents the replenishment energy of the j-th sensor in the sensor module, CE max represents the maximum replenishment energy of the sensor module, d min represents the minimum distance between the sensor module and the gateway, d j represents the distance between the j-th sensor in the sensor module and the gateway, d max represents the maximum distance between the sensor module and the gateway, j represents a variable, and n represents the number of sensors in the sensor module.

可选地,所述对所述初始竞争窗口进行环境奖励更新,输出目的竞争窗口,包括:Optionally, updating the initial contention window with an environmental reward and outputting a target contention window includes:

所述环境奖励rt表示为:The environmental reward r t is expressed as:

Figure BDA0003207391850000031
Figure BDA0003207391850000031

其中,ηt表示为所述智能传感器的吞吐量,ndead表示为所述智能传感器的能量中断次数。Wherein, η t represents the throughput of the smart sensor, and n dead represents the number of energy interruptions of the smart sensor.

另一方面,本发明实施例还公开了一种基于深度强化学习的工业物联网能量可持续性决策系统,包括:On the other hand, an embodiment of the present invention further discloses an industrial Internet of Things energy sustainability decision system based on deep reinforcement learning, including:

第一单元,用于建立传感器无线局域网络,其中,所述传感器无线局域网络包括网关和传感模块,所述传感模块包括普通传感器和智能传感器;The first unit is used to establish a sensor wireless local area network, wherein the sensor wireless local area network includes a gateway and a sensor module, and the sensor module includes a common sensor and an intelligent sensor;

第二单元,用于根据所述传感模块的数据传输碰撞概率建立基于马尔科夫链的状态转换模型,得到所述传感模块的数据传输概率;The second unit is used to establish a state transition model based on a Markov chain according to the data transmission collision probability of the sensor module to obtain the data transmission probability of the sensor module;

第三单元,用于根据所述传感模块的数据接收功率建立能量消耗模型;A third unit is used to establish an energy consumption model according to the data receiving power of the sensor module;

第四单元,用于根据所述传感模块的吞吐量建立吞吐量优化模型,其中,所述吞吐量用于表征所述传感模块在一定时间内发送数据包的数据量大小;A fourth unit is used to establish a throughput optimization model according to the throughput of the sensor module, wherein the throughput is used to characterize the amount of data packets sent by the sensor module within a certain period of time;

第五单元,用于根据所述传感模块的数据传输概率、所述能量消耗模型和所述吞吐量优化模型对所述传感器无线局域网络进行优化,得到能量可持续网络;A fifth unit, configured to optimize the sensor wireless local area network according to the data transmission probability of the sensor module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network;

第六单元,用于通过所述能量可持续网络输出得到所述智能传感器的竞争窗口。The sixth unit is configured to obtain the contention window of the smart sensor through the energy sustainable network output.

另一方面,本发明实施例还公开了一种电子设备,包括处理器以及存储器;On the other hand, an embodiment of the present invention further discloses an electronic device, including a processor and a memory;

所述存储器用于存储程序;The memory is used to store programs;

所述处理器执行所述程序实现如前面所述的方法。The processor executes the program to implement the method described above.

另一方面,本发明实施例还公开了一种计算机可读存储介质,所述存储介质存储有程序,所述程序被处理器执行实现如前面所述的方法。On the other hand, an embodiment of the present invention further discloses a computer-readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to implement the method described above.

另一方面,本发明实施例还公开了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器可以从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行前面的方法。On the other hand, an embodiment of the present invention further discloses a computer program product or a computer program, which includes a computer instruction stored in a computer-readable storage medium. A processor of a computer device can read the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the above method.

本发明采用以上技术方案与现有技术相比,具有以下技术效果:本发明通过根据所述传感模块的数据传输碰撞概率建立基于马尔科夫链的状态转换模型,得到所述传感模块的数据传输概率;根据所述传感模块的数据接收功率建立能量消耗模型;根据所述传感模块的吞吐量建立吞吐量优化模型;根据所述传感模块的数据传输概率、所述能量消耗模型和所述吞吐量优化模型对所述传感器无线局域网络进行优化,得到能量可持续网络;能够在不发生能量中断的情况下提高智能传感器的吞吐量。Compared with the prior art, the present invention adopts the above technical scheme and has the following technical effects: the present invention establishes a state transition model based on a Markov chain according to the data transmission collision probability of the sensor module to obtain the data transmission probability of the sensor module; establishes an energy consumption model according to the data receiving power of the sensor module; establishes a throughput optimization model according to the throughput of the sensor module; optimizes the sensor wireless local area network according to the data transmission probability of the sensor module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network; and can improve the throughput of the intelligent sensor without energy interruption.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1为本发明实施例的具体流程图;FIG1 is a specific flow chart of an embodiment of the present invention;

图2为本发明实施例的传感器无线局域网络拓扑图。FIG. 2 is a topology diagram of a wireless local area network of sensors according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.

本发明实施例了基于深度强化学习的工业物联网能量可持续性决策机制,包括:The present invention implements an industrial Internet of Things energy sustainability decision-making mechanism based on deep reinforcement learning, including:

S1、建立传感器无线局域网络,其中,所述传感器无线局域网络包括网关和传感模块,所述传感模块包括普通传感器和智能传感器;S1. Establishing a sensor wireless local area network, wherein the sensor wireless local area network includes a gateway and a sensor module, and the sensor module includes a common sensor and an intelligent sensor;

S2、根据所述传感模块的数据传输碰撞概率建立基于马尔科夫链的状态转换模型,得到所述传感模块的数据传输概率;S2. Establishing a state transition model based on a Markov chain according to the data transmission collision probability of the sensor module to obtain the data transmission probability of the sensor module;

S3、根据所述传感模块的数据接收功率建立能量消耗模型;S3, establishing an energy consumption model according to the data receiving power of the sensor module;

S4、根据所述传感模块的吞吐量建立吞吐量优化模型,其中,所述吞吐量用于表征所述传感模块在一定时间内发送数据包的数据量大小;S4, establishing a throughput optimization model according to the throughput of the sensor module, wherein the throughput is used to characterize the amount of data packets sent by the sensor module within a certain period of time;

S5、根据所述传感模块的数据传输概率、所述能量消耗模型和所述吞吐量优化模型对所述传感器无线局域网络进行优化,得到能量可持续网络;S5, optimizing the sensor wireless local area network according to the data transmission probability of the sensor module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network;

S6、通过所述能量可持续网络输出得到所述智能传感器的竞争窗口。S6. Obtaining a contention window of the smart sensor through the energy sustainable network output.

参照图2,,在IEEE 802.11ax协议下由一个网关1和其相连接的多个无线传感器建立传感器无线局域网络,无线传感器中包含一个智能传感器3和多个普通传感器2,无线传感器的信号传输从周围环境中获取电能补充能量。所有的传感器只能通过网关1进行网络通信,其相互之间不能进行直接地通信,且每次只处理一个数据包。在网关1和所有传感器之间存在时变的无线信道,信道系数表示为h={hi|i∈n},其中第t周期内网关1与智能传感器3之间的信道系数表示为hn,t,该通道在长度为T的每个周期内是不变的。此外,使用b(t)(0≤b(t)≤m)表示时间t时的后退数,其中m是最大后退数减一;使用s(t)表示传感器在某一时刻t所在回退阶段(0,1,…,m)的随机过程。普通传感器2采用随机方式更新其竞争窗口的大小,而智能传感器3通过深度强化学习与环境的交互来动态选取最佳的竞争窗口。Referring to FIG2 , a sensor wireless local area network is established by a gateway 1 and multiple wireless sensors connected thereto under the IEEE 802.11ax protocol. The wireless sensors include a smart sensor 3 and multiple ordinary sensors 2. The signal transmission of the wireless sensors obtains electrical energy from the surrounding environment to supplement energy. All sensors can only communicate through the gateway 1. They cannot communicate directly with each other, and only one data packet is processed at a time. There is a time-varying wireless channel between the gateway 1 and all sensors. The channel coefficient is expressed as h={h i |i∈n}, where the channel coefficient between the gateway 1 and the smart sensor 3 in the tth period is expressed as h n,t , and the channel is constant in each period of length T. In addition, b(t) (0≤b(t)≤m) is used to represent the backoff number at time t, where m is the maximum backoff number minus one; s(t) is used to represent the random process of the backoff stage (0, 1, ..., m) of the sensor at a certain time t. Ordinary sensor 2 updates the size of its competition window in a random manner, while smart sensor 3 dynamically selects the best competition window through deep reinforcement learning and interaction with the environment.

进一步作为优选的实施方式,上述步骤S2中,根据所述传感模块的数据传输碰撞概率建立基于马尔科夫链的状态转换模型,得到所述传感模块的数据传输概率,包括:As a further preferred implementation, in the above step S2, a state transition model based on a Markov chain is established according to the data transmission collision probability of the sensor module to obtain the data transmission probability of the sensor module, including:

确定所述传感模块在所述传感器无线局域网络中进行数据传输时发生的数据传输碰撞概率:Determine the probability of data transmission collision that occurs when the sensor module performs data transmission in the sensor wireless local area network:

结合所述数据传输碰撞概率和离散时间马尔科夫链模拟所述传感模块的数据传输碰撞过程,确定基于马尔科夫链的状态转换模型;Combining the data transmission collision probability and the discrete-time Markov chain to simulate the data transmission collision process of the sensor module, and determining a state transition model based on the Markov chain;

对所述马尔科夫链的状态转换模型进行归一化条件处理,得到所述传感模块的数据传输概率。The state transition model of the Markov chain is subjected to normalization condition processing to obtain the data transmission probability of the sensor module.

其中,在传感器无线网络数据传输的每个阶段初始时,传感器在一定范围内随机选取P作为数据传输碰撞概率,用于表征在信道上传输的数据包所发生的碰撞概率。在每次数据传输尝试中,从传感器发送的数据包都以恒定和独立的概率P进行碰撞,使用离散时间马尔可夫链模拟二维过程{s(t),b(t)},在这个马尔可夫链中,非零一步转移概率P可用下式进行表示:

Figure BDA0003207391850000051
Among them, at the beginning of each stage of data transmission in the sensor wireless network, the sensor randomly selects P within a certain range as the data transmission collision probability, which is used to characterize the collision probability of data packets transmitted on the channel. In each data transmission attempt, the data packets sent from the sensor collide with a constant and independent probability P. The discrete time Markov chain is used to simulate the two-dimensional process {s(t), b(t)}. In this Markov chain, the non-zero one-step transition probability P can be expressed as follows:
Figure BDA0003207391850000051

公式中,i为变量s(t),k为变量b(t),m为最大后退数减一,W0为智能传感器初始化竞争窗口,Wi为第i个普通传感器的竞争窗口,Wm为智能传感器竞争窗口。In the formula, i is the variable s(t), k is the variable b(t), m is the maximum backoff number minus one, W0 is the initialization competition window of the smart sensor, Wi is the competition window of the i-th ordinary sensor, and Wm is the competition window of the smart sensor.

当系统趋于稳定的时候,使用bi,k=limt→∞P{s(t)=i,b(t)=k}k∈[0,Wi-1],i∈[0,m]表示马尔可夫链的平稳分布。证明该马尔可夫链的闭解,如下所示:When the system tends to be stable, use bi , k = lim t→∞ P{s(t) = i, b(t) = k} k∈[0, Wi -1], i∈[0, m] to represent the stationary distribution of the Markov chain. The closed solution of the Markov chain is shown as follows:

bi-1,0·p=bi,0→bi,0=pib0,00<i<m;b i-1, 0 · p = b i, 0 → b i, 0 = p i b 0, 0 0<i<m;

Figure BDA0003207391850000052
Figure BDA0003207391850000052

公式中,bi-1,0表示{s(t)=i-1,b(t)=0},p表示条件碰撞概率,bi,0表示{s(t)=i,b(t)=0},bm-1,0表示{s(t)=m-1,b(t)=0},bm,0表示{s(t)=m,b(t)=0},b0,0表示{s(t)=0,b(t)=0},i表示变量,m表示最大后退数减一。In the formula, bi -1,0 means {s(t)=i-1, b(t)=0}, p means the conditional collision probability, bi ,0 means {s(t)=i, b(t)=0}, bm -1,0 means {s(t)=m-1, b(t)=0}, bm ,0 means {s(t)=m, b(t)=0}, b0,0 means {s(t)=0, b(t)=0}, i means a variable, and m means the maximum backoff number minus one.

进而可以得到:Then we can get:

Figure BDA0003207391850000061
Figure BDA0003207391850000061

化简得到:Simplifying, we get:

Figure BDA0003207391850000062
Figure BDA0003207391850000062

对该马尔可夫链施加归一化条件,进行简化如下所示:Applying normalization conditions to the Markov chain, it can be simplified as follows:

Figure BDA0003207391850000063
Figure BDA0003207391850000063

即:Right now:

Figure BDA0003207391850000064
Figure BDA0003207391850000064

用τ表示一个传感器在随机选择的时段时间内传输的概率,当回退时间计数器等于零时发生任何传输,传感器的数据传输概率τ为:Let τ represent the probability of a sensor transmitting in a randomly selected time slot. When the backoff time counter is equal to zero, any transmission occurs. The data transmission probability τ of the sensor is:

Figure BDA0003207391850000065
Figure BDA0003207391850000065

进一步作为优选的实施方式,上述步骤S3中,根据所述传感模块的数据接收功率建立能量消耗模型,包括:As a further preferred implementation, in the above step S3, establishing an energy consumption model according to the data receiving power of the sensor module includes:

根据所述传感模块的接收功率计算得到所述传感模块的信噪比阈值;Calculate the signal-to-noise ratio threshold of the sensor module according to the received power of the sensor module;

根据所述信噪比阈值计算得到所述传感模块进行成功传输需要的能量消耗;Calculating the energy consumption required for successful transmission of the sensor module according to the signal-to-noise ratio threshold;

根据所述能量消耗建立所述能量消耗模型。The energy consumption model is established according to the energy consumption.

其中,传感器将收集的能量储存在超级电容器中进行数据传输,而超级电容器的能量容量表示为Cmax。考虑一般的能量采集模型,我们假设在t时段,传感器通过能量回收为超级电容器补充能量为CEt焦耳/毫秒。由于传感器的可移动性以及传输环境存在变化,传感器在每个阶段收集到的能量是时变的。此外,我们假设所有设备的接收端均存在均值为零、方差σ2相等的加性高斯白噪声。为了确保传感器发送的信号能被网关成功捕获,在网关处接收的信噪比(SNR)应大于捕获阈值。因此,在传感器数据传输概率为τ的条件下,最小接收功率为P0,网关处接收的信噪比最低阈值ζ为:Among them, the sensor stores the collected energy in the supercapacitor for data transmission, and the energy capacity of the supercapacitor is expressed as C max . Considering the general energy harvesting model, we assume that in time period t, the sensor replenishes the energy of the supercapacitor through energy recovery as CE t joules/millisecond. Due to the mobility of the sensor and the changes in the transmission environment, the energy collected by the sensor in each stage is time-varying. In addition, we assume that there is additive Gaussian white noise with zero mean and equal variance σ 2 at the receiving end of all devices. In order to ensure that the signal sent by the sensor can be successfully captured by the gateway, the signal-to-noise ratio (SNR) received at the gateway should be greater than the capture threshold. Therefore, under the condition that the probability of sensor data transmission is τ, the minimum received power is P 0 , and the minimum threshold of the signal-to-noise ratio received at the gateway is:

Figure BDA0003207391850000071
Figure BDA0003207391850000071

公式中,h为信道系数,P0为最小接收功率,σ2为高斯白噪音。In the formula, h is the channel coefficient, P0 is the minimum received power, and σ2 is Gaussian white noise.

因此,在t时段内,传感器进行成功传输需要消耗的最小能量E0为:Therefore, within the time period t, the minimum energy E0 consumed by the sensor for successful transmission is:

Figure BDA0003207391850000072
Figure BDA0003207391850000072

其中,ζ为信噪比最低阈值,σ2为高斯白噪音,ΔT表示为数据包传输所需要的时间,ht为在t时段内的信道系数。Among them, ζ is the minimum threshold of signal-to-noise ratio, σ 2 is Gaussian white noise, ΔT represents the time required for data packet transmission, and h t is the channel coefficient in time period t.

为了实现能源的可持续利用,所有传感器在上行传输时能源消耗均为E0。假设在t时段内,传感器总共传送了zt个数据包,则在t时段阶段结束后,能量消耗模型

Figure BDA0003207391850000073
为:In order to achieve sustainable use of energy, the energy consumption of all sensors during uplink transmission is E 0 . Assuming that in a period of t, the sensor transmits a total of z t data packets, then after the end of the period of t, the energy consumption model
Figure BDA0003207391850000073
for:

Figure BDA0003207391850000074
Figure BDA0003207391850000074

公式中,Et为t时段开始前超级电容的电量,zt为在t时段内发送的数据包个数,E0为最小消耗能量,CEt为在t时段内能量回收为电容器补充的能量,T为该t时段的时间间隔。In the formula, Et is the charge of the supercapacitor before the start of period t, zt is the number of data packets sent in period t, E0 is the minimum energy consumption, CEt is the energy replenished to the capacitor by energy recovery in period t, and T is the time interval of period t.

进一步作为优选的实施方式,上述步骤S5中,根据所述传感模块的数据传输概率、所述能量消耗模型和所述吞吐量优化模型对所述传感器无线局域网络进行优化,得到能量可持续性系统,包括:As a further preferred implementation, in the above step S5, the sensor wireless local area network is optimized according to the data transmission probability of the sensor module, the energy consumption model and the throughput optimization model to obtain an energy sustainability system, including:

为所述智能传感器搭建优化神经网络;Building an optimized neural network for the smart sensor;

根据所述能量消耗模型计算得到所述智能传感器的剩余能量;Calculating the remaining energy of the smart sensor according to the energy consumption model;

将所述数据传输概率和所述剩余能量输入到所述优化神经网络中,得到能量可持续网络。The data transmission probability and the remaining energy are input into the optimization neural network to obtain an energy sustainable network.

其中,为智能传感器搭建四个神经网络,分别是用于选择竞争窗口值的执行策略网络;用于对竞争窗口值进行评价的执行评价网络;用于稳定训练并为执行价值网络的更新提供竞争窗口值的目标策略网络和用于为执行评价网络的更新提供下一价值的目标评价网络。在数据传输概率确定的条件下,通过能量消耗模型计算得到智能传感器的剩余能量,将信道状态信息、距离移动情况以及能量回收、剩余能量作为观测输入到执行策略网络中,得到经过优化的无线局域网络,即能量可持续网络。Among them, four neural networks are built for smart sensors, namely, the execution strategy network for selecting the contention window value; the execution evaluation network for evaluating the contention window value; the target strategy network for stabilizing the training and providing the contention window value for the update of the execution value network; and the target evaluation network for providing the next value for the update of the execution evaluation network. Under the condition of the data transmission probability being determined, the remaining energy of the smart sensor is calculated through the energy consumption model, and the channel state information, distance movement, energy recovery, and remaining energy are input into the execution strategy network as observations to obtain an optimized wireless local area network, namely, an energy sustainable network.

进一步作为优选的实施方式,上述步骤S6中,通过所述能量可持续网络输出得到所述智能传感器的竞争窗口,包括:As a further preferred implementation, in the above step S6, obtaining the contention window of the smart sensor through the energy sustainable network output includes:

对所述能量可持续网络进行初始化,确定初始化系统;Initializing the energy sustainable network and determining an initialization system;

对所述初始化系统中的智能传感器随机生成初始竞争窗口;Randomly generating an initial contention window for the smart sensors in the initialization system;

对所述初始竞争窗口进行环境奖励更新,输出目的竞争窗口。The initial contention window is updated with an environmental reward, and a target contention window is output.

其中,对能量可持续网络中的经验放回池、普通传感器的竞争窗口、系统化环境进行初始化处理。普通传感器在一定范围内独立随机设置初始竞争窗口大小,并随机生成初始化系统中智能传感器的竞争窗口值。根据能量消耗模型可得到在所有传感器上行传输单个数据包时能量消耗为E0的条件下,能源可持续利用。此时,所有传感器根据SEH-CSMA/CA协议进行数据包的发送,在一定的时间周期后,统计智能传感器成功传输数据包的数量以及能量中断情况,并计算其在该阶段的吞吐量大小。根据奖励回报公式,计算当前获得的回报r。同时,智能传感器利用均方误差来计算执行评价网络的损失函数,进而通过反向梯度传递的方式对执行评价网络进行更新;同时,将当前时刻所有传感器的状态和智能传感器的竞争窗口值输入到执行评价网络中,得到状态-动作价值,并利用该价值通过反向梯度传递的方式更新执行策略网络。此外,按照每步复制一定比例的方式逐步更新目标策略网络和目标评价网络,通过将以上过程反复迭代,智能传感器会根据动态情况的变化来做出适应性的调整,得到智能传感器的竞争窗口大小。Among them, the experience return pool, the contention window of ordinary sensors, and the systematic environment in the energy sustainable network are initialized. Ordinary sensors independently and randomly set the initial contention window size within a certain range, and randomly generate the contention window value of the smart sensors in the initialization system. According to the energy consumption model, it can be obtained that when the energy consumption of all sensors uplinks a single data packet is E 0 , energy can be used sustainably. At this time, all sensors send data packets according to the SEH-CSMA/CA protocol. After a certain period of time, the number of successful data packets transmitted by the smart sensor and the energy interruption are counted, and the throughput size of this stage is calculated. According to the reward return formula, the current reward r is calculated. At the same time, the smart sensor uses the mean square error to calculate the loss function of the execution evaluation network, and then updates the execution evaluation network by reverse gradient transfer; at the same time, the state of all sensors at the current moment and the contention window value of the smart sensor are input into the execution evaluation network to obtain the state-action value, and the value is used to update the execution strategy network by reverse gradient transfer. In addition, the target strategy network and the target evaluation network are gradually updated by copying a certain proportion at each step. By repeatedly iterating the above process, the smart sensor will make adaptive adjustments according to changes in dynamic conditions to obtain the competition window size of the smart sensor.

进一步作为优选的实施方式,根据所述传感模块的吞吐量建立吞吐量优化模型,包括:As a further preferred implementation, a throughput optimization model is established according to the throughput of the sensor module, including:

所述优化模型为:The optimization model is:

Figure BDA0003207391850000081
Figure BDA0003207391850000081

s.t. C1:ndead≤0,st C1: n dead ≤ 0,

Figure BDA0003207391850000082
Figure BDA0003207391850000082

Figure BDA0003207391850000083
Figure BDA0003207391850000083

其中,d表示为距离,t表示为时间,αt表示为折扣因子,ηt表示为所述智能传感器的吞吐量,ndead表示为所述智能传感器的能量中断次数,CEmin表示为传感模块的最小补充能量,CEj表示为传感模块中第j个传感器的补充能量,CEmax为传感模块的最大补充能量,dmin表示传感模块与网关之间的最小距离,dj表示传感模块中第j个传感器与网关之间的距离,dmax表示传感模块与网关之间的最大距离,j表示变量,n表示传感模块中传感器的个数。Among them, d represents distance, t represents time, α t represents discount factor, η t represents throughput of the smart sensor, n dead represents the number of energy interruptions of the smart sensor, CE min represents the minimum replenishment energy of the sensor module, CE j represents the replenishment energy of the j-th sensor in the sensor module, CE max represents the maximum replenishment energy of the sensor module, d min represents the minimum distance between the sensor module and the gateway, d j represents the distance between the j-th sensor in the sensor module and the gateway, d max represents the maximum distance between the sensor module and the gateway, j represents a variable, and n represents the number of sensors in the sensor module.

进一步作为优选的实施方式,对所述初始竞争窗口进行环境奖励更新,输出目的竞争窗口,包括:As a further preferred implementation, the initial contention window is updated with an environmental reward, and a target contention window is output, including:

所述环境奖励rt表示为:The environmental reward r t is expressed as:

Figure BDA0003207391850000084
Figure BDA0003207391850000084

其中,ηt表示为所述智能传感器的吞吐量,ndead表示为所述智能传感器的能量中断次数。Wherein, η t represents the throughput of the smart sensor, and n dead represents the number of energy interruptions of the smart sensor.

结合附图1。本发明的流程具体包括:In conjunction with Figure 1, the process of the present invention specifically includes:

建立由网关、普通传感器和智能传感器组成的传感器无线局域网络,传感器在无线局域网络中进行数据传输,在传输过程中数据发生碰撞,基于马尔可夫链的状态转换模型通过数学的分析去得到每个传感器在时变环境中传输数据的概率。在传感器的数据接收功率确定为P0的条件下建立能量消耗模型,根据传感器的吞吐量建立吞吐量优化模型。在传感器的数据传输概率确定的条件下,根据能量消耗模型和吞吐量优化模型对传感器无线局域网络进行优化,得到能量可持续网络,对能量可持续网络进行计算和更新,得到智能传感器的竞争窗口。A sensor wireless local area network consisting of a gateway, common sensors and smart sensors is established. The sensors transmit data in the wireless local area network. Data collision occurs during the transmission process. The probability of each sensor transmitting data in a time-varying environment is obtained through mathematical analysis based on the state transition model of the Markov chain. An energy consumption model is established under the condition that the data receiving power of the sensor is determined to be P 0 , and a throughput optimization model is established based on the throughput of the sensor. Under the condition that the data transmission probability of the sensor is determined, the sensor wireless local area network is optimized according to the energy consumption model and the throughput optimization model to obtain an energy sustainable network. The energy sustainable network is calculated and updated to obtain the contention window of the smart sensor.

本发明实施例还提供了一种基于深度强化学习的工业物联网能量可持续性决策系统,包括:The embodiment of the present invention also provides an industrial Internet of Things energy sustainability decision system based on deep reinforcement learning, including:

第一单元,用于建立传感器无线局域网络,其中,所述传感器无线局域网络包括网关和传感模块,所述传感模块包括普通传感器和智能传感器;The first unit is used to establish a sensor wireless local area network, wherein the sensor wireless local area network includes a gateway and a sensor module, and the sensor module includes a common sensor and an intelligent sensor;

第二单元,用于根据所述传感模块的数据传输碰撞概率建立基于马尔科夫链的状态转换模型,得到所述传感模块的数据传输概率;The second unit is used to establish a state transition model based on a Markov chain according to the data transmission collision probability of the sensor module to obtain the data transmission probability of the sensor module;

第三单元,用于根据所述传感模块的数据接收功率建立能量消耗模型;A third unit is used to establish an energy consumption model according to the data receiving power of the sensor module;

第四单元,用于根据所述传感模块的吞吐量建立吞吐量优化模型,其中,所述吞吐量用于表征所述传感模块在一定时间内发送数据包的数据量大小;A fourth unit is used to establish a throughput optimization model according to the throughput of the sensor module, wherein the throughput is used to characterize the amount of data packets sent by the sensor module within a certain period of time;

第五单元,用于根据所述传感模块的数据传输概率、所述能量消耗模型和所述吞吐量优化模型对所述传感器无线局域网络进行优化,得到能量可持续网络;A fifth unit, configured to optimize the sensor wireless local area network according to the data transmission probability of the sensor module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network;

第六单元,用于通过所述能量可持续网络输出得到所述智能传感器的竞争窗口。The sixth unit is configured to obtain the contention window of the smart sensor through the energy sustainable network output.

与图1的方法相对应,本发明实施例还提供了一种电子设备,包括处理器以及存储器;所述存储器用于存储程序;所述处理器执行所述程序实现如前面所述的方法。Corresponding to the method of FIG. 1 , an embodiment of the present invention further provides an electronic device, including a processor and a memory; the memory is used to store a program; the processor executes the program to implement the method described above.

与图1的方法相对应,本发明实施例还提供了一种计算机可读存储介质,所述存储介质存储有程序,所述程序被处理器执行实现如前面所述的方法。Corresponding to the method of FIG. 1 , an embodiment of the present invention further provides a computer-readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to implement the method described above.

本发明实施例还公开了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器可以从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行图1所示的方法。The embodiment of the present invention also discloses a computer program product or a computer program, which includes a computer instruction stored in a computer-readable storage medium. A processor of a computer device can read the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the method shown in FIG1.

综上所述,本发明实施例具有以下优点:In summary, the embodiments of the present invention have the following advantages:

(1)本发明实施例通过基于马尔可夫链的状态转换模型分析每个传感器的数据传输概率,能够提高系统的准确性。(1) The embodiment of the present invention can improve the accuracy of the system by analyzing the data transmission probability of each sensor based on the state transition model of the Markov chain.

(2)本发明实施例通过吞吐量优化模型对传感器无线局域网络进行优化,能够提高系统的吞吐量。(2) The embodiment of the present invention optimizes the sensor wireless local area network through a throughput optimization model, thereby improving the throughput of the system.

在一些可选择的实施例中,在方框图中提到的功能/操作可以不按照操作示图提到的顺序发生。例如,取决于所涉及的功能/操作,连续示出的两个方框实际上可以被大体上同时地执行或所述方框有时能以相反顺序被执行。此外,在本发明的流程图中所呈现和描述的实施例以示例的方式被提供,目的在于提供对技术更全面的理解。所公开的方法不限于本文所呈现的操作和逻辑流程。可选择的实施例是可预期的,其中各种操作的顺序被改变以及其中被描述为较大操作的一部分的子操作被独立地执行。In some selectable embodiments, the function/operation mentioned in the block diagram may not occur in the order mentioned in the operation diagram. For example, depending on the function/operation involved, the two boxes shown in succession can actually be executed substantially simultaneously or the boxes can sometimes be executed in reverse order. In addition, the embodiment presented and described in the flow chart of the present invention is provided by way of example, for the purpose of providing a more comprehensive understanding of technology. The disclosed method is not limited to the operation and logic flow presented herein. Selectable embodiments are expected, wherein the order of various operations is changed and the sub-operation of a part for which is described as a larger operation is performed independently.

此外,虽然在功能性模块的背景下描述了本发明,但应当理解的是,除非另有相反说明,所述的功能和/或特征中的一个或多个可以被集成在单个物理装置和/或软件模块中,或者一个或多个功能和/或特征可以在单独的物理装置或软件模块中被实现。还可以理解的是,有关每个模块的实际实现的详细讨论对于理解本发明是不必要的。更确切地说,考虑到在本文中公开的装置中各种功能模块的属性、功能和内部关系的情况下,在工程师的常规技术内将会了解该模块的实际实现。因此,本领域技术人员运用普通技术就能够在无需过度试验的情况下实现在权利要求书中所阐明的本发明。还可以理解的是,所公开的特定概念仅仅是说明性的,并不意在限制本发明的范围,本发明的范围由所附权利要求书及其等同方案的全部范围来决定。In addition, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise specified, one or more of the functions and/or features described may be integrated into a single physical device and/or software module, or one or more functions and/or features may be implemented in separate physical devices or software modules. It is also understood that a detailed discussion of the actual implementation of each module is unnecessary for understanding the present invention. More specifically, in view of the properties, functions, and internal relationships of the various functional modules in the device disclosed herein, the actual implementation of the module will be understood within the conventional skills of the engineer. Therefore, those skilled in the art can implement the present invention set forth in the claims without excessive experimentation using ordinary techniques. It is also understood that the specific concepts disclosed are merely illustrative and are not intended to limit the scope of the present invention, which is determined by the full scope of the appended claims and their equivalents.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in each embodiment of the present invention. The aforementioned storage medium includes: various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.

在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。The logic and/or steps represented in the flowchart or otherwise described herein, for example, can be considered as an ordered list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by an instruction execution system, device or apparatus (such as a computer-based system, a system including a processor, or other system that can fetch instructions from an instruction execution system, device or apparatus and execute instructions), or in conjunction with such instruction execution systems, devices or apparatuses. For the purposes of this specification, "computer-readable medium" can be any device that can contain, store, communicate, propagate or transmit a program for use by an instruction execution system, device or apparatus, or in conjunction with such instruction execution systems, devices or apparatuses.

计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。More specific examples of computer-readable media (a non-exhaustive list) include the following: an electrical connection with one or more wires (electronic device), a portable computer disk case (magnetic device), a random access memory (RAM), a read-only memory (ROM), an erasable and programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disk read-only memory (CDROM). In addition, the computer-readable medium may even be a paper or other suitable medium on which the program is printed, since the program may be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, deciphering or, if necessary, processing in another suitable manner, and then stored in a computer memory.

应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that the various parts of the present invention can be implemented by hardware, software, firmware or a combination thereof. In the above-mentioned embodiments, a plurality of steps or methods can be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented by hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: a discrete logic circuit having a logic gate circuit for implementing a logic function for a data signal, a dedicated integrated circuit having a suitable combination of logic gate circuits, a programmable gate array (PGA), a field programmable gate array (FPGA), etc.

在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, the description with reference to the terms "one embodiment", "some embodiments", "examples", "specific examples", or "some examples" means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present invention. In this specification, the schematic representation of the above terms does not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described may be combined in any one or more embodiments or examples in a suitable manner.

尽管已经示出和描述了本发明的实施例,本领域的普通技术人员可以理解:在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the present invention, and that the scope of the present invention is defined by the claims and their equivalents.

以上是对本发明的较佳实施进行了具体说明,但本发明并不限于所述实施例,熟悉本领域的技术人员在不违背本发明精神的前提下还可做出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present invention, but the present invention is not limited to the described embodiments. Those skilled in the art may make various equivalent modifications or substitutions without violating the spirit of the present invention. These equivalent modifications or substitutions are all included in the scope defined by the claims of this application.

Claims (7)

1. Deep reinforcement learning-based industrial internet of things energy sustainability decision method is characterized by comprising the following steps:
establishing a sensor wireless local area network, wherein the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
establishing an energy consumption model according to the data receiving power of the sensing module;
establishing a throughput optimization model according to the throughput of the sensing module, wherein the throughput is used for representing the data volume of a data packet sent by the sensing module within a certain time;
optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network;
obtaining a competition window of the intelligent sensor through the energy sustainable network output;
wherein, the optimizing the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model and the throughput optimization model to obtain an energy sustainable network comprises:
building an optimized neural network for the intelligent sensor;
calculating to obtain the residual energy of the intelligent sensor according to the energy consumption model;
inputting the data transmission probability and the residual energy into the optimized neural network to obtain a sustainable energy network;
the obtaining of the contention window of the smart sensor through the energy sustainable network output includes:
initializing the energy sustainable network, and determining an initialization system;
randomly generating an initial competition window for the intelligent sensor in the initialization system;
updating the environment reward of the initial competition window and outputting a target competition window;
the updating of the environment reward to the initial competition window and the outputting of the target competition window comprise:
the environment award r t Expressed as:
Figure FDA0004056867890000011
wherein eta is t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions of the smart sensor.
2. The deep reinforcement learning-based industrial internet of things energy sustainability decision method as claimed in claim 1, wherein the establishing a state transition model based on a markov chain according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module comprises:
determining the data transmission collision probability of the sensing module when the sensing module transmits data in the sensor wireless local area network;
simulating a data transmission collision process of the sensing module by combining the data transmission collision probability and the discrete time Markov chain, and determining a state transition model based on the Markov chain;
and carrying out normalization condition processing on the state transition model of the Markov chain to obtain the data transmission probability of the sensing module.
3. The deep reinforcement learning-based industrial internet of things energy sustainability decision method according to claim 1, wherein the building of the energy consumption model according to the data receiving power of the sensing module comprises:
calculating to obtain a signal-to-noise ratio threshold value of the sensing module according to the receiving power of the sensing module;
calculating energy consumption required by the sensing module for successful transmission according to the signal-to-noise ratio threshold;
and establishing the energy consumption model according to the energy consumption.
4. The deep reinforcement learning-based industrial internet of things energy sustainability decision method according to claim 1, wherein the establishing of a throughput optimization model according to the throughput of the sensing module comprises:
the optimization model is as follows:
Figure FDA0004056867890000021
s.t.C1:n dead ≤0,
Figure FDA0004056867890000022
Figure FDA0004056867890000023
wherein d is distance, t is time, and alpha is t Expressed as a discount factor, eta t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions, CE, of the smart sensor min Expressed as minimum supplemental energy, CE, of the sensing module j Expressed as the supplemental energy, CE, of the jth sensor in the sensing module max Maximum supplementary energy for the sensing module, d min Indicating the minimum distance between the sensing module and the gateway, d j Represents the distance between the jth sensor in the sensing module and the gateway, d max The maximum distance between the sensing module and the gateway is represented, j represents a variable, and n represents the number of sensors in the sensing module.
5. An industrial internet of things energy sustainability decision system based on deep reinforcement learning, comprising:
the system comprises a first unit and a second unit, wherein the first unit is used for establishing a sensor wireless local area network, the sensor wireless local area network comprises a gateway and a sensing module, and the sensing module comprises a common sensor and an intelligent sensor;
the second unit is used for establishing a Markov chain-based state transition model according to the data transmission collision probability of the sensing module to obtain the data transmission probability of the sensing module;
the third unit is used for establishing an energy consumption model according to the data receiving power of the sensing module;
a fourth unit, configured to establish a throughput optimization model according to throughput of the sensing module, where the throughput is used to characterize a data volume of a data packet sent by the sensing module within a certain time;
a fifth unit, configured to optimize the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model, so as to obtain a sustainable energy network;
a sixth unit, configured to obtain a contention window of the smart sensor through the energy sustainable network output;
the fifth unit is configured to optimize the sensor wireless local area network according to the data transmission probability of the sensing module, the energy consumption model, and the throughput optimization model to obtain an energy sustainable network, and includes:
building an optimized neural network for the intelligent sensor;
calculating to obtain the residual energy of the intelligent sensor according to the energy consumption model;
inputting the data transmission probability and the residual energy into the optimized neural network to obtain an energy sustainable network;
the sixth unit is configured to obtain a contention window of the smart sensor through the energy sustainable network output, and includes:
initializing the energy sustainable network, and determining an initialization system;
randomly generating an initial competition window for the intelligent sensor in the initialization system;
updating the environment reward of the initial competition window and outputting a target competition window;
the updating of the environment reward to the initial competition window and the outputting of the target competition window comprise:
the environment award r t Expressed as:
Figure FDA0004056867890000031
wherein eta t Expressed as the throughput of the smart sensor, n dead Expressed as the number of energy interruptions of the smart sensor.
6. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program realizes the method of any one of claims 1-4.
7. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1-4.
CN202110920967.1A 2021-08-11 2021-08-11 Energy Sustainability Decision-Making Mechanism for Industrial Internet of Things Based on Deep Reinforcement Learning Active CN113747384B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110920967.1A CN113747384B (en) 2021-08-11 2021-08-11 Energy Sustainability Decision-Making Mechanism for Industrial Internet of Things Based on Deep Reinforcement Learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110920967.1A CN113747384B (en) 2021-08-11 2021-08-11 Energy Sustainability Decision-Making Mechanism for Industrial Internet of Things Based on Deep Reinforcement Learning

Publications (2)

Publication Number Publication Date
CN113747384A CN113747384A (en) 2021-12-03
CN113747384B true CN113747384B (en) 2023-04-07

Family

ID=78730740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110920967.1A Active CN113747384B (en) 2021-08-11 2021-08-11 Energy Sustainability Decision-Making Mechanism for Industrial Internet of Things Based on Deep Reinforcement Learning

Country Status (1)

Country Link
CN (1) CN113747384B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111278161A (en) * 2020-01-19 2020-06-12 电子科技大学 WLAN protocol design and optimization method based on energy collection and deep reinforcement learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100943175B1 (en) * 2007-11-30 2010-02-19 한국전자통신연구원 Wireless Sensor Network and its Control Method Using Dynamic-based Message Delivery Method
CN105792253B (en) * 2016-02-25 2019-03-29 安徽农业大学 A kind of wireless sense network medium access control optimization method
CN110972162B (en) * 2019-11-22 2022-03-25 南京航空航天大学 A Markov Chain Based Saturation Throughput Solution for Underwater Acoustic Sensor Networks

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111278161A (en) * 2020-01-19 2020-06-12 电子科技大学 WLAN protocol design and optimization method based on energy collection and deep reinforcement learning

Also Published As

Publication number Publication date
CN113747384A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
CN109976909B (en) Learning-based low-latency task scheduling method in edge computing network
CN111898764B (en) Federated learning method, device and chip
US20210081763A1 (en) Electronic device and method for controlling the electronic device thereof
CN112988285B (en) Task unloading method and device, electronic equipment and storage medium
CN110519816B (en) Wireless roaming control method, device, storage medium and terminal equipment
CN113760511A (en) A task offloading method for edge computing based on deep deterministic strategy
CN112383485B (en) Network congestion control method and device
CN116187430A (en) Federal learning method and related device
US20240386328A1 (en) Upside-down reinforcement learning
CN114238658A (en) Link prediction method and device of time sequence knowledge graph and electronic equipment
US10055687B2 (en) Method for creating predictive knowledge structures from experience in an artificial agent
CN113747384B (en) Energy Sustainability Decision-Making Mechanism for Industrial Internet of Things Based on Deep Reinforcement Learning
CN114090108B (en) Method and device for executing computing task, electronic equipment and storage medium
WO2025031515A1 (en) Multi-user multi-task computation offloading method and apparatus with throughput prediction, and medium
CN117436485A (en) End-edge-cloud collaboration system and method based on multiple exit points that trade off latency and accuracy
CN117671385A (en) Training method, system, device and storage medium for target recognition model
CN116976708A (en) Multi-agent data evaluation decision method, device, equipment and medium
CN115150335B (en) Optimal flow segmentation method and system based on deep reinforcement learning
CN117369964A (en) Task processing method and related device of edge computing system
CN113673665B (en) Method, system, device and medium for optimizing wireless energy supply system of capsule robot
CN116560832A (en) Resource allocation method oriented to federal learning and related equipment
Wan et al. Scheduling real-time wireless traffic: A network-aided offline reinforcement learning approach
Gao et al. Multi-Level Feature Transmission in Dynamic Channels: A Semantic Knowledge Base and Deep Reinforcement Learning-Enabled Approach
CN113556780B (en) Congestion control method and device
CN115016858B (en) A task offloading method based on post-decision state deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant