CN113537625A - Data sharing method considering energy consumption efficiency in power internet of things based on block chain - Google Patents

Data sharing method considering energy consumption efficiency in power internet of things based on block chain Download PDF

Info

Publication number
CN113537625A
CN113537625A CN202110873613.6A CN202110873613A CN113537625A CN 113537625 A CN113537625 A CN 113537625A CN 202110873613 A CN202110873613 A CN 202110873613A CN 113537625 A CN113537625 A CN 113537625A
Authority
CN
China
Prior art keywords
data
preset
current
block chain
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110873613.6A
Other languages
Chinese (zh)
Inventor
蔡婷
蔡宇
闫会峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Yitong College
Original Assignee
Chongqing Yitong College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Yitong College filed Critical Chongqing Yitong College
Priority to CN202110873613.6A priority Critical patent/CN113537625A/en
Publication of CN113537625A publication Critical patent/CN113537625A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y10/00Economic sectors
    • G16Y10/35Utilities, e.g. electricity, gas or water
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y20/00Information sensed or collected by the things
    • G16Y20/40Information sensed or collected by the things relating to personal data, e.g. biometric data, records or preferences
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y40/00IoT characterised by the purpose of the information processing
    • G16Y40/50Safety; Security of things, users, data or systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Accounting & Taxation (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data sharing method considering energy consumption efficiency in an electric power Internet of things based on a block chain, which is used for solving the technical problems of low data transmission safety and integrity, low data sharing rate and high energy consumption of the existing data sharing method. Wherein, the invention includes: training an optimal routing inspection path at a routing inspection site of the power system; patrolling the power system based on the optimal patrolling path to obtain sensing data; and uploading the sensing data to a preset block chain.

Description

Data sharing method considering energy consumption efficiency in power internet of things based on block chain
Technical Field
The invention relates to the technical field of data sharing, in particular to a data sharing method considering energy consumption efficiency in an electric power Internet of things based on a block chain.
Background
With the deep fusion of the internet of things technology and a smart grid, the intelligent mining of mass user side data, the safe sharing of energy data, the real-time processing of routing inspection data and the wide interconnection of edge data of a power system, the ubiquitous power internet of things with comprehensive sensing of the construction state and ubiquitous data connection becomes a current research hotspot, and the mass data sensing and the safe sharing in the power internet of things become key problems to be solved urgently.
The large-scale power system is easy to generate high-frequency abnormal conditions of power transmission (distribution) equipment in the long-term operation process in the field (or indoor) severe environment, the traditional manual regular inspection operation mode is relied on, the workload is large, the efficiency is low, the investment cost is high, the life safety of operators can be threatened by long-period power inspection operation in the complex environment, and therefore, the intelligent robot inspection becomes a safe and efficient alternative mode.
However, how to collect high-quality data on the site of a power system by using a mobile intelligent device (such as a mobile intelligent device, an unmanned aerial vehicle, etc.) and realize safe and real-time data sharing is a problem to be solved urgently. On one hand, in the case that the endurance and sensing range of a Mobile Smart Terminal (MST) are limited, how to obtain more high-quality data perception is very important. On the other hand, how to share the sensed data to other MSTs and promote mutual understanding and cooperation among devices so as to complete the allocation task better is one of the key problems to be solved urgently.
The traditional centralized power internet of things system structure provides a serious challenge for data privacy, data real-time sharing and low-delay transmission. In addition, the existing robot inspection mostly takes manual operation as a main part, and the optimization of the inspection path of the robot under the condition of limited cruising ability is not considered, so that the data sharing rate is improved.
In summary, the existing power internet of things data sensing and sharing have the following disadvantages: firstly, after high-definition data of a power system field acquired by the mobile intelligent device are transmitted back to a base for processing by using a centralized power Internet of things system, the safety and integrity of the data cannot be guaranteed; secondly, the problem of insufficient cruising ability of the unmanned aerial vehicle can not be fully considered when manual operation is used for on-site routing inspection; finally, the existing motion path optimization algorithm of the mobile intelligent device does not have the autonomous learning capability, and the minimum energy consumption cost of the mobile intelligent device can not be realized while sharing data as much as possible.
Disclosure of Invention
The invention provides a data sharing method considering energy consumption efficiency in an electric power Internet of things based on a block chain, which is used for solving the technical problems of low data transmission safety and integrity, low data sharing rate and high energy consumption of the existing data sharing method.
The invention provides a data sharing method of an electric power Internet of things, which is applied to mobile intelligent equipment and comprises the following steps:
training an optimal routing inspection path at a routing inspection site of the power system;
patrolling the power system based on the optimal patrolling path to obtain sensing data;
and uploading the sensing data to a preset block chain.
Optionally, the step of training an optimal patrol route at a patrol site of the power system includes:
acquiring the current environmental state of the power system at the current moment of the inspection site;
inputting the current environment state into a preset first network to obtain a current output action;
calculating rewards according to the current output action and the current environment state, and generating a next environment state;
storing the current environmental state, the current output action, the reward, and the next environmental state in a preset playback pool;
judging whether the current time is equal to a preset time or not, if not, adopting the time corresponding to the next environment state as the current time, and returning to the step of inputting the current environment state into a preset first network to obtain a current output action until the current time is equal to the preset time;
acquiring a training sample from the playback pool, calculating a target value of the training sample through a preset second network, and obtaining a strategy gradient based on the target value;
optimizing the second network based on the target value to obtain a second optimization parameter;
optimizing the first network by adopting the strategy gradient to obtain a first optimization parameter;
and generating an optimal routing inspection path based on the first optimization parameter and the second optimization parameter.
Optionally, the step of uploading the sensing data to a preset block chain includes:
obtaining a private key;
signing the perception data by adopting the private key to obtain encrypted data;
sending the encrypted data to a preset authentication and authorization center for verification;
and when receiving verification passing information returned by the authentication and authorization center, sending the encrypted data to a preset block chain.
Optionally, the step of sending the encrypted data to a preset block chain when receiving verification passing information returned by the authentication and authorization center includes:
when verification passing information returned by the authentication and authorization center is received, a storage request is sent to the block chain; the storage request carries the encrypted data; and the block chain is used for verifying the encrypted data and storing the sensing data when the verification is passed.
The invention also provides a data sharing device of the power Internet of things, which is applied to mobile intelligent equipment, and the device comprises:
the optimal routing inspection path training module is used for training an optimal routing inspection path on a routing inspection site of the power system;
the perception data acquisition module is used for patrolling the power system based on the optimal patrolling path to acquire perception data;
and the uploading module is used for uploading the sensing data to a preset block chain.
Optionally, the optimal patrol path training module includes:
the current environment state acquisition submodule is used for acquiring the current environment state of the power system at the current moment of the inspection site;
the current output action acquisition submodule is used for inputting the current environment state into a preset first network to obtain a current output action;
the reward calculation submodule is used for calculating reward according to the current output action and the current environment state and generating the next environment state;
a storage submodule, configured to store the current environment state, the current output action, the reward, and the next environment state in a preset playback pool;
the circulation submodule is used for judging whether the current time is equal to a preset time or not, if not, the time corresponding to the next environment state is taken as the current time, and the step of inputting the current environment state into a preset first network to obtain the current output action is returned until the current time is equal to the preset time;
the strategy gradient generation submodule is used for acquiring a training sample from the playback pool, calculating a target value of the training sample through a preset second network, and obtaining a strategy gradient based on the target value;
a second optimization parameter obtaining submodule, configured to optimize the second network based on the target value, so as to obtain a second optimization parameter;
a first optimization parameter obtaining submodule, configured to optimize the first network by using the policy gradient to obtain a first optimization parameter;
and the optimal routing inspection path generation submodule is used for generating an optimal routing inspection path based on the first optimization parameter and the second optimization parameter.
Optionally, the upload module includes:
the private key obtaining sub-module is used for obtaining a private key;
the encryption submodule is used for signing the sensing data by adopting the private key to obtain encrypted data;
the verification sub-module is used for sending the encrypted data to a preset authentication authorization center for verification;
and the uploading sub-module is used for sending the encrypted data to a preset block chain when receiving verification passing information returned by the authentication and authorization center.
Optionally, the upload sub-module includes:
the uploading unit is used for sending a storage request to the block chain when receiving verification passing information returned by the authentication and authorization center; the storage request carries the encrypted data; and the block chain is used for verifying the encrypted data and storing the sensing data when the verification is passed.
The invention also provides an electronic device comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is used for executing the data sharing method of the power internet of things according to instructions in the program codes.
The invention also provides a computer readable storage medium for storing program code for executing the data sharing method of the power internet of things as described in any one of the above.
According to the technical scheme, the invention has the following advantages: the optimal routing inspection path of the mobile intelligent equipment on the routing inspection site of the power system is trained based on the preset deterministic strategy gradient algorithm, so that the mobile intelligent equipment can share data as much as possible to the maximum extent and reduce the energy consumption cost. Meanwhile, the security and the integrity of data storage are improved in a mode that sensing data sensed in the inspection process are uploaded to a block chain for storage.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a flowchart illustrating steps of a data sharing method for an internet of things of electric power according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a deep reinforcement learning process according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating steps of training an optimal routing inspection path according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of data sensing based on multi-agent DRL according to an embodiment of the present invention;
FIG. 5 is an MST training framework provided by embodiments of the present invention;
fig. 6 is a flowchart illustrating a procedure of uploading sensing data to a predetermined block chain according to an embodiment of the present invention;
fig. 7 is a block diagram of a data sharing apparatus of an electric power internet of things according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a data sharing method considering energy consumption efficiency in an electric power Internet of things based on a block chain, which is used for solving the technical problems of low data transmission safety and integrity, low data sharing rate and high energy consumption in the existing data sharing method.
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a data sharing method for an internet of things of electric power according to an embodiment of the present invention.
The invention provides a data sharing method of an electric power Internet of things, which is applied to mobile intelligent equipment, wherein the mobile intelligent equipment can comprise an inspection robot and an unmanned aerial vehicle, and the method specifically comprises the following steps:
101, training an optimal routing inspection path on a routing inspection site of a power system;
the problem that the insufficient cruising ability of the unmanned aerial vehicle cannot be fully considered when manual operation is carried out on-site routing inspection; and the problem that the existing mobile intelligent equipment motion path optimization algorithm does not have autonomous learning capability and cannot realize minimum energy consumption cost per se while sharing data as much as possible, the embodiment of the invention models the optimal routing inspection path learning of the mobile intelligent equipment into a Markov Decision Process (MDP) with a continuous motion space. Each mobile smart device MST is an agent whose goal is to maximize its overall reward by learning the optimal patrol path policy through interaction with the environment. Because the invention is oriented to distributed and continuous multi-agent data sensing and sharing, the traditional DRL (Deep Learning) algorithm cannot meet the text requirement. Therefore, in the embodiment of the invention, a multi-agent DRL solution based on DDPG is provided to realize the optimal path training among the multi-agents. In particular, a mixed cooperation-competition relationship exists between the MSTs which are communicated with each other, and a single MST can cooperate with other MSTs to complete the field data perception of the power system. However, in the case that the total amount of data points is relatively fixed, there is a case that MSTs compete with each other in order to maximize the self-reward. In one example, a multi-agent DRL can employ a framework of centralized training, decentralized execution. During the training process, the additional information of the department MST (e.g., actions, rewards, training cycles, etc.) may be used to train the cooperative learning strategies of other MSTs. But may not use the private information of other MSTs in the process of execution to maintain independence and autonomy of the MSTs.
The markov decision process is a mathematical model of sequential decisions for simulating stochastic strategies and returns achievable by an agent in an environment where the system state has markov properties. The MDP is built based on a set of interactive objects, namely agents and environments, with elements including state, actions, policies and rewards. In the simulation of MDP, the agent perceives the current system state and acts on the environment in a strategic manner, thereby changing the state of the environment and receiving rewards, the accumulation of which over time is referred to as rewards.
DRL combines the perception ability of Deep Learning (DL) and the decision-making ability of Reinforcement Learning (RL), can be controlled directly according to input state information, and is an artificial intelligence method closer to a human thinking mode. The brief learning process may be as shown in FIG. 2:
at each moment, agent (software or hardware entity capable of autonomous activity, such as mobile intelligent device in the embodiment of the present invention) interacts with the environment to obtain an observation at a high latitude, and senses the observation (Observations) by using a DL method to obtain a specific state feature representation; evaluating a cost function of each Action based on expected Reward (Reward), and mapping the current state to a corresponding Action (Action) through a certain strategy; the Environment (Environment) reacts to this action and gets the next observation, and by continuously cycling the above processes, the optimal strategy for achieving the goal can be finally obtained.
102, patrolling the power system based on the optimal patrolling path to obtain sensing data;
in the embodiment of the invention, the mobile intelligent equipment can acquire the field data of the power system (such as a transformer substation, a power transmission line and the like) and acquire the running condition of the equipment.
And 103, uploading the sensing data to a preset block chain.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, a confidential algorithm and the like, and has the characteristics of decentralization, no tampering, trace remaining in the whole process, traceability, collective maintenance, openness and transparency and the like.
In order to provide a safe data storage and sharing service, the mobile intelligent device in the embodiment of the invention can realize distributed storage and sharing of the sensed field data through the blockchain with the help of the edge server.
The optimal routing inspection path of the mobile intelligent equipment on the routing inspection site of the power system is trained based on the preset deterministic strategy gradient algorithm, so that the mobile intelligent equipment can share data as much as possible to the maximum extent and reduce the energy consumption cost. Meanwhile, the security and the integrity of data storage are improved in a mode that sensing data sensed in the inspection process are uploaded to a block chain for storage.
Referring to fig. 3, fig. 3 is a flowchart illustrating a procedure for training an optimal routing inspection path according to an embodiment of the present invention.
In the embodiment of the present invention, the step of training the optimal routing inspection path at the routing inspection site of the power system may specifically include:
s31, acquiring the current environmental state of the power system at the current moment of the inspection site;
s32, inputting the current environment state into a preset first network to obtain the current output action;
s33, calculating reward according to the current output action and the current environment state, and generating the next environment state;
s34, storing the current environment state, the current output action, the reward and the next environment state in a preset playback pool;
s35, judging whether the current time is equal to a preset time, if not, adopting the time corresponding to the next environment state as the current time, and returning to the step of inputting the current environment state into a preset first network to obtain the current output action until the current time is equal to the preset time;
s36, obtaining training samples from the playback pool, calculating the target value of the training samples through a preset second network, and obtaining a strategy gradient based on the target value;
s37, optimizing the second network based on the target value to obtain a second optimization parameter;
s38, optimizing the first network by adopting a strategy gradient to obtain a first optimization parameter;
and S39, generating an optimal routing inspection path based on the first optimization parameter and the second optimization parameter.
In one example of the present invention, an embodiment of the present invention proposes a DDPG based multi-agent DRL solution to achieve optimal path training between multi-agents. In particular, a mixed cooperation-competition relationship exists between the MSTs which are communicated with each other, and a single MST can cooperate with other MSTs to complete the field data perception of the power system. However, in the case that the total amount of data points is relatively fixed, there is a case that MSTs compete with each other in order to maximize the self-reward. In one example, a multi-agent DRL can employ a framework of centralized training, decentralized execution. During the training process, the additional information (e.g., actions, rewards, training periods, etc.) of some MSTs may be used to train the cooperative learning strategies of other MSTs. But may not use the private information of other MSTs in the process of execution to maintain independence and autonomy of the MSTs.
Referring to fig. 4, fig. 4 shows a schematic diagram of data sensing based on multi-agent DRL.
As shown in fig. 4, states, actions and rewards are the basic three elements of a DRL, and given a state and a series of alternative actions, the goal of MST is to find a strategy that maximizes the cumulative rewards. It is assumed that MSTm (m 1, 2,.., m) is generated by observing the environment
Figure BDA0003189560760000081
And selects an action
Figure BDA0003189560760000082
Acting on the environment, a prize r can be obtainedt m. The system environment is composed of a group of states, including current routing inspection data point distribution, MST position coordinates, historical tracks and the like. After the action is executed
Figure BDA0003189560760000083
After that, the environmental state is represented by stIs converted into st+1. The present invention defines the following for status, actions and rewards:
1) state space S: (S) { (S)1,S2,S3) Is a description of the environment. Wherein S1Representing coordinate positions of the patrol data points and the obstacle in the two-dimensional area. The definition is as follows:
Figure BDA0003189560760000084
wherein the content of the first and second substances,
Figure BDA0003189560760000085
representing a set of obstacles within the sensing region,
Figure BDA0003189560760000086
representing a set of data points within the sensing region that need to be collected,
Figure BDA0003189560760000087
xn,xc∈[0,Qx],yn,yc∈[0,Qy],Qx、Qyrespectively representing the X, Y-axis coordinate maximum of a mobile smart device (MST) perception data region.
S2The current position of the MST in the two-dimensional region is defined, and the formula is expressed as follows:
Figure BDA0003189560760000091
wherein the content of the first and second substances,
Figure BDA00031895607600000912
representing a set of mobile smart devices, defined as:
Figure BDA0003189560760000093
the coordinates representing the MSTm are shown,
Figure BDA0003189560760000094
representing the percentage of remaining power at time t of MSTm.
S3Representing the perceived time h of the data point n accumulated at time tt(n)∈[0,T]And T represents the maximum number of time periods in each data sensing round.
ht+1(n)=ht(n)+1
2) An action space A: the motion space defines the continuous motion behavior of the MST within the two-dimensional target region:
Figure BDA0003189560760000095
wherein the content of the first and second substances,
Figure BDA0003189560760000096
indicates the direction (angle) of the MST movement,
Figure BDA0003189560760000097
indicates the distance moved,/maxIndicating the maximum distance that the MST can move within a unit time t.
3) Reward R: the reward includes a perceived amount of data
Figure BDA0003189560760000098
Energy consumption
Figure BDA0003189560760000099
Where energy consumption is mainly due to MST data perception and self-movement. Suppose that
Figure BDA00031895607600000910
α represents the energy expended to perceive a unit of data and β represents the energy expended to move a unit of distance.
Thus awarding rt mThe calculation can be as follows:
Figure BDA00031895607600000911
each MST is trained by four groups of deep neural networks, including an action network Actor network mu (s | theta)μ) Critic network Q (s, a | θ)Q) Target network T _ Actor and T _ critical. In the embodiment of the present invention, as shown in fig. 5, an action network Actor network and a target network T _ Actor constitute a first network; critic network Q (s, a | θ)Q) And the target network T _ Critic form a second network; wherein, thetaμAnd thetaQIs a random initialization parameter of the Actor and Critic networks, and an initial parameter theta of the T _ Actor network and the T _ Critic networkμ':=θμ,θQ':=θQ. The function of each neural network is as follows:
1) an Actor network: is responsible for thetaμIterative updating of parameters based on the input current environmental state stSelecting a Current output action atFor interacting with the environment and generating a next state st+1Receive the current award rt
2) Critic network: is responsible for thetaQIteratively updating the parameters and calculating the current Q value Q (s, a | theta)Q)。
3) T _ Actor network: selecting the optimal next action a 'according to the small batch of training samples sampled in the experience playback pool B'tNetwork parameter θμ'Periodically from thetaμAnd (6) updating.
4) T _ Critic network: responsible for calculating Q ' (s ', a ' | theta)Q') Network parameter θQ'Periodically from thetaQAnd (6) updating.
In each round of data perception, the MST first observes the ambient state stE is S, will state StThe input is sent to the Actor network to generate an action output at. To increase the randomness of the training process and the exploration of the state space, a certain noise can be added to the selected actions, atThe calculation formula is as follows:
Figure BDA0003189560760000101
wherein the noise
Figure BDA0003189560760000102
Can be generated according to the Ornstein-Ullenbeck random procedure (Omstein-Uhlenbeck, Ou).
In the centralized training process, each MST has a private playback pool BmTransferring samples(s) in a storage statet,at,rt,st+1) And H samples are adopted from each private playback pool to form H groups of small training samples. Target T _ Actor network outputs action a 'through small-batch training samples'tCritic networks pass through the minimization of the Loss function Loss (θ)Q) Updating the parameters of the self, wherein the formula is as follows:
Figure BDA0003189560760000103
the target value y of the training sample is then calculated according to the following formulat
Figure BDA0003189560760000104
Where γ ∈ (0, 1) represents the discount factor.
Then, the Actor network is updated according to the policy gradient, and the formula is as follows:
Figure BDA0003189560760000105
wherein the content of the first and second substances,
Figure BDA0003189560760000106
it is noted that the target network, T _ Actor network and T _ critical network, are two copies of the Actor and critical networks. After the parameters of the Actor network and the critical network are updated, in order to improve the stability of the learning process, the parameters of the T _ Actor and the T _ critical network may be updated in a soft update manner. The details are as follows:
θμ':=τθμ+(1-τ)θμ'
θQ':=τθQ+(1-τ)θQ'
wherein τ is an update factor, and generally takes a small value (e.g., 0.1 or 0.01). After sufficient learning, all parameters of the neural network are optimized to obtain the optimal action selection strategy, so that the optimal routing inspection path is obtained. The data sensing task is completed through the optimal routing inspection path, and energy consumption cost can be reduced.
In one example, the multi-agent optimal patrol path learning process is as shown in table 1 below:
Figure BDA0003189560760000107
Figure BDA0003189560760000111
TABLE 1
Referring to fig. 6, fig. 6 is a flowchart illustrating a procedure for uploading sensing data to a predetermined block chain according to an embodiment of the present invention.
In this embodiment of the present invention, the step of uploading the sensing data to the preset block chain may include:
s61, obtaining a private key;
s62, signing the sensing data by using a private key to obtain encrypted data;
s63, sending the encrypted data to a preset authentication and authorization center for verification;
and S64, when the verification passing information returned by the authentication and authorization center is received, sending the encrypted data to a preset block chain.
In practical applications, in order to provide a secure data storage and sharing service, a private chain may be constructed based on Ethereum (etherhouse), wherein the private chain includes a miner node and a non-miner node.
Ethereum is an open-source, intelligent contract-enabled, common blockchain platform that provides decentralized ethernet Virtual machines (Ethereum Virtual machines) to handle point-to-point contracts via its dedicated cryptocurrency ethernet (ETH, abbreviated as "ETH").
1) A miner node: the data sharing transaction is written into the block and participates in transaction verification on the basis of meeting the block chain consensus protocol. Considering that the participation of block consensus and verification requires certain computational power of the MST, the embodiment of the present invention introduces the edge computing server to help the MST complete the computation task, and adds the block passing the verification to the blockchain network.
2) Non-miner nodes: since the non-mineworker node is only responsible for accepting and broadcasting the data sharing transaction request, it need not possess the same computing resources as the mineworker node.
According to the block chain characteristics, each Ethereum node keeps a complete and verified block chain and MST service copy. Assuming that each MST is actively connected to the blockchain network, after each round of data sensing, the MST uploads the data sensed from the power system site to the blockchain. Therefore, in the u-th round, the total number of data storage requests issued by all MSTs is Y ═ M × u. Where M is the number of MSTs and u is the number of data sensing rounds. In addition, to ensure that the data sent to the blockchain network is valid and not forged, the MST needs to encrypt the data using its own private key to verify the legitimacy of the data source. The data sharing process is shown in table 2 below.
Figure BDA0003189560760000121
Figure BDA0003189560760000131
TABLE 2
In Table 2, idm: represents the identity (identity) obtained after registration of the MST numbered m;
certm: an authentication certificate obtained after registration of MST with the number m;
pkm: public key of MST with number m;
skm: a private key secret key representing MST numbered m;
wam: a blockchain wallet address representing MST numbered m;
Figure BDA0003189560760000141
the MST with the number of m is represented in the u-th data sensing round, and the total amount of the collected data is accumulated;
Figure BDA0003189560760000142
representing the encryption of the acquired data by using a Hash algorithm;
Figure BDA0003189560760000143
private key sk indicating use of MST m for encrypted datamCarrying out signature;
Figure BDA0003189560760000144
the authentication authorization center verifies the encrypted data packet with the private key signature to prevent data from being forged.
In a specific implementation, Gen (1) is performed firstv) Obtaining public-private key pair (pk) of MST mm,skm) In which 1 isvIndicating a security parameter. The total number of rounds of MST perceptual data is U, each round comprising T time periods. Thus, the amount of data that MST cumulatively senses after the end of each data sensing round is as follows:
Figure BDA0003189560760000145
wherein the content of the first and second substances,
Figure BDA0003189560760000146
indicating the tth time of MSTm in sensing round uSegment-aware data volume. MST uses its own private key pkmAnd after the data is signed, sending the data to an authentication and authorization center for verification. The certificate authority decrypts the data packet and verifies whether the data packet is a legitimate MST transmission. In addition, obtained after decryption
Figure BDA0003189560760000147
And the data stored in the MST is compared with the data stored in the MST so as to prevent data from being forged. After receiving the storage request from the MST, the blockchain verifies the signature on the request and determines whether to submit the transaction to the blockchain network based on the verification result. The submitted request transaction will be mined by the miner node and written to the new block.
The distributed storage and sharing of the routing inspection data of the power system are realized by using the block chain, and the safety and the integrity of the power data are ensured. Meanwhile, the decentralized sharing mode improves the flexibility and the access efficiency of the system.
Referring to fig. 7, fig. 7 is a block diagram of a data sharing device of an electric power internet of things according to an embodiment of the present invention.
The embodiment of the invention provides a data sharing device of an electric power Internet of things, which is applied to mobile intelligent equipment and comprises:
an optimal routing inspection path training module 701, configured to train an optimal routing inspection path at a routing inspection site of the power system;
a perception data obtaining module 702, configured to patrol the power system based on the optimal patrol route, and obtain perception data;
an uploading module 703 is configured to upload the sensing data to a preset block chain.
In this embodiment of the present invention, the optimal routing inspection path training module 701 includes:
the current environment state acquisition submodule is used for acquiring the current environment state of the power system at the current moment of the inspection site;
the current output action acquisition submodule is used for inputting the current environment state into a preset first network to obtain a current output action;
the reward calculation submodule is used for calculating reward according to the current output action and the current environment state and generating the next environment state;
the storage submodule is used for storing the current environment state, the current output action, the reward and the next environment state in a preset playback pool;
the circulation submodule is used for judging whether the current time is equal to the preset time or not, if not, the time corresponding to the next environment state is adopted as the current time, and the step of inputting the current environment state into the preset first network to obtain the current output action is returned until the current time is equal to the preset time;
the strategy gradient generation submodule is used for acquiring a training sample from the playback pool, calculating the target value of the training sample through a preset second network, and obtaining a strategy gradient based on the target value;
the second optimization parameter obtaining submodule is used for optimizing a second network based on the target value to obtain a second optimization parameter;
the first optimization parameter obtaining submodule is used for optimizing the first network by adopting strategy gradient to obtain a first optimization parameter;
and the optimal routing inspection path generation submodule is used for generating an optimal routing inspection path based on the first optimization parameter and the second optimization parameter.
In this embodiment of the present invention, the upload module 703 includes:
the private key obtaining sub-module is used for obtaining a private key;
the encryption submodule is used for signing the sensing data by adopting a private key to obtain encrypted data;
the verification sub-module is used for sending the encrypted data to a preset authentication authorization center for verification;
and the uploading sub-module is used for sending the encrypted data to the preset block chain when receiving the verification passing information returned by the authentication and authorization center.
In an embodiment of the present invention, the upload sub-module includes:
the uploading unit is used for sending a storage request to the block chain when receiving verification passing information returned by the authentication and authorization center; the storage request carries encrypted data; the block chain is used for verifying the encrypted data and storing the sensing data when the verification is passed.
The invention also provides an electronic device comprising a processor and a memory:
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is used for executing the data sharing method of the power internet of things according to the instructions in the program codes.
The embodiment of the invention also provides a computer-readable storage medium, which is used for storing the program codes, and the program codes are used for executing the data sharing method of the power internet of things in the embodiment of the invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A data sharing method of an electric power Internet of things is applied to mobile intelligent equipment, and comprises the following steps:
training an optimal routing inspection path at a routing inspection site of the power system;
patrolling the power system based on the optimal patrolling path to obtain sensing data;
and uploading the sensing data to a preset block chain.
2. The method of claim 1, wherein the step of training an optimal routing inspection path at a routing inspection site of the power system comprises:
acquiring the current environmental state of the power system at the current moment of the inspection site;
inputting the current environment state into a preset first network to obtain a current output action;
calculating rewards according to the current output action and the current environment state, and generating a next environment state;
storing the current environmental state, the current output action, the reward, and the next environmental state in a preset playback pool;
judging whether the current time is equal to a preset time or not, if not, adopting the time corresponding to the next environment state as the current time, and returning to the step of inputting the current environment state into a preset first network to obtain a current output action until the current time is equal to the preset time;
acquiring a training sample from the playback pool, calculating a target value of the training sample through a preset second network, and obtaining a strategy gradient based on the target value;
optimizing the second network based on the target value to obtain a second optimization parameter;
optimizing the first network by adopting the strategy gradient to obtain a first optimization parameter;
and generating an optimal routing inspection path based on the first optimization parameter and the second optimization parameter.
3. The method of claim 1, wherein the step of uploading the perception data to a preset blockchain comprises:
obtaining a private key;
signing the perception data by adopting the private key to obtain encrypted data;
sending the encrypted data to a preset authentication and authorization center for verification;
and when receiving verification passing information returned by the authentication and authorization center, sending the encrypted data to a preset block chain.
4. The method according to claim 3, wherein the step of sending the encrypted data to a preset block chain when receiving verification passing information returned by the certificate authority comprises:
when verification passing information returned by the authentication and authorization center is received, a storage request is sent to the block chain; the storage request carries the encrypted data; and the block chain is used for verifying the encrypted data and storing the sensing data when the verification is passed.
5. The utility model provides a data sharing device of electric power thing networking which characterized in that is applied to mobile intelligent equipment, the device includes:
the optimal routing inspection path training module is used for training an optimal routing inspection path on a routing inspection site of the power system;
the perception data acquisition module is used for patrolling the power system based on the optimal patrolling path to acquire perception data;
and the uploading module is used for uploading the sensing data to a preset block chain.
6. The apparatus of claim 5, wherein the optimal routing inspection path training module comprises:
the current environment state acquisition submodule is used for acquiring the current environment state of the power system at the current moment of the inspection site;
the current output action acquisition submodule is used for inputting the current environment state into a preset first network to obtain a current output action;
the reward calculation submodule is used for calculating reward according to the current output action and the current environment state and generating the next environment state;
a storage submodule, configured to store the current environment state, the current output action, the reward, and the next environment state in a preset playback pool;
the circulation submodule is used for judging whether the current time is equal to a preset time or not, if not, the time corresponding to the next environment state is taken as the current time, and the step of inputting the current environment state into a preset first network to obtain the current output action is returned until the current time is equal to the preset time;
the strategy gradient generation submodule is used for acquiring a training sample from the playback pool, calculating a target value of the training sample through a preset second network, and obtaining a strategy gradient based on the target value;
a second optimization parameter obtaining submodule, configured to optimize the second network based on the target value, so as to obtain a second optimization parameter;
a first optimization parameter obtaining submodule, configured to optimize the first network by using the policy gradient to obtain a first optimization parameter;
and the optimal routing inspection path generation submodule is used for generating an optimal routing inspection path based on the first optimization parameter and the second optimization parameter.
7. The apparatus of claim 5, wherein the upload module comprises:
the private key obtaining sub-module is used for obtaining a private key;
the encryption submodule is used for signing the sensing data by adopting the private key to obtain encrypted data;
the verification sub-module is used for sending the encrypted data to a preset authentication authorization center for verification;
and the uploading sub-module is used for sending the encrypted data to a preset block chain when receiving verification passing information returned by the authentication and authorization center.
8. The apparatus of claim 7, wherein the upload sub-module comprises:
the uploading unit is used for sending a storage request to the block chain when receiving verification passing information returned by the authentication and authorization center; the storage request carries the encrypted data; and the block chain is used for verifying the encrypted data and storing the sensing data when the verification is passed.
9. An electronic device, comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is used for executing the data sharing method of the power internet of things as claimed in any one of claims 1-4 according to instructions in the program code.
10. A computer-readable storage medium for storing program code for performing the data sharing method of the power internet of things of any one of claims 1-4.
CN202110873613.6A 2021-07-30 2021-07-30 Data sharing method considering energy consumption efficiency in power internet of things based on block chain Pending CN113537625A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110873613.6A CN113537625A (en) 2021-07-30 2021-07-30 Data sharing method considering energy consumption efficiency in power internet of things based on block chain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110873613.6A CN113537625A (en) 2021-07-30 2021-07-30 Data sharing method considering energy consumption efficiency in power internet of things based on block chain

Publications (1)

Publication Number Publication Date
CN113537625A true CN113537625A (en) 2021-10-22

Family

ID=78121645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110873613.6A Pending CN113537625A (en) 2021-07-30 2021-07-30 Data sharing method considering energy consumption efficiency in power internet of things based on block chain

Country Status (1)

Country Link
CN (1) CN113537625A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446830A (en) * 2018-11-13 2019-03-08 中链科技有限公司 Data center environment information processing method and device based on block chain
CN111754020A (en) * 2020-05-14 2020-10-09 大唐七台河发电有限责任公司 Intelligent power plant equipment inspection system and method based on data analysis
CN112003886A (en) * 2020-07-03 2020-11-27 北京工业大学 Block chain-based Internet of things data sharing system and method
CN112581026A (en) * 2020-12-29 2021-03-30 杭州趣链科技有限公司 Joint path planning method for logistics robot on alliance chain

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446830A (en) * 2018-11-13 2019-03-08 中链科技有限公司 Data center environment information processing method and device based on block chain
CN111754020A (en) * 2020-05-14 2020-10-09 大唐七台河发电有限责任公司 Intelligent power plant equipment inspection system and method based on data analysis
CN112003886A (en) * 2020-07-03 2020-11-27 北京工业大学 Block chain-based Internet of things data sharing system and method
CN112581026A (en) * 2020-12-29 2021-03-30 杭州趣链科技有限公司 Joint path planning method for logistics robot on alliance chain

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHI HAROLD LIU , QIUXIA LIN , SHILIN WEN: "Blockchain-Enabled Data Collection and Sharing for Industrial IoT With Deep Reinforcement Learning", 《IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS》 *
CHI HAROLD LIU; ZHEYU CHEN; YUFENG ZHAN: "Energy-Efficient Distributed Mobile Crowd Sensing: A Deep Learning Approach", 《IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 》 *
TING CAI,YUXIN WU,HUI LIN,YU CAI: "Blockchain-Empowered Big Data Sharing for Internet of Things", 《INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH》 *
格雷拉-皮奇: "DDPGN算法流程", 《HTTPS://BLOG.CSDN.NET/WEIXIN_43897187/ARTICLE/DETAILS/109673711》 *

Similar Documents

Publication Publication Date Title
Lin et al. Task offloading for wireless VR-enabled medical treatment with blockchain security using collective reinforcement learning
Dai et al. Deep reinforcement learning and permissioned blockchain for content caching in vehicular edge computing and networks
Lu et al. Communication-efficient federated learning and permissioned blockchain for digital twin edge networks
Zhou et al. Cyber-physical-social systems: A state-of-the-art survey, challenges and opportunities
Qi et al. Federated reinforcement learning: Techniques, applications, and open challenges
Li et al. Resource optimization for delay-tolerant data in blockchain-enabled IoT with edge computing: A deep reinforcement learning approach
Luo et al. Cloud-based information infrastructure for next-generation power grid: Conception, architecture, and applications
Kotb et al. Cloud-based multi-agent cooperation for IoT devices using workflow-nets
Li et al. Minimizing packet expiration loss with path planning in UAV-assisted data sensing
Wang et al. Cooperative and competitive multi-agent systems: From optimization to games
He et al. A blockchain-based scheme for secure data offloading in healthcare with deep reinforcement learning
Xu et al. A survey on digital twin for industrial internet of things: Applications, technologies and tools
Carli et al. A distributed control algorithm for optimal charging of electric vehicle fleets with congestion management
Khalid et al. A secure trust method for multi-agent system in smart grids using blockchain
Kong et al. A reliable and efficient task offloading strategy based on multifeedback trust mechanism for IoT edge computing
Hussin et al. Improving reliability in resource management through adaptive reinforcement learning for distributed systems
Rjoub et al. Explainable AI-based federated deep reinforcement learning for trusted autonomous driving
CN116614385A (en) Service scheduling path planning method, device and equipment based on digital twin
Li et al. Learning-based predictive control via real-time aggregate flexibility
Ding et al. Distributed machine learning for uav swarms: Computing, sensing, and semantics
He et al. Towards trusted node selection using blockchain for crowdsourced abnormal data detection
Xiao et al. Deep reinforcement learning for optimal resource allocation in blockchain-based IoV secure systems
Cui et al. A many-objective evolutionary algorithm based on constraints for collaborative computation offloading
CN114567560A (en) Edge node dynamic resource allocation method based on generation confrontation simulation learning
Wang et al. Wireless powered metaverse: Joint task scheduling and trajectory design for multi-devices and multi-UAVs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211022

RJ01 Rejection of invention patent application after publication