CN112419064A - Energy transaction method, device and equipment based on deep reinforcement learning and alliance chain - Google Patents
Energy transaction method, device and equipment based on deep reinforcement learning and alliance chain Download PDFInfo
- Publication number
- CN112419064A CN112419064A CN202011420188.7A CN202011420188A CN112419064A CN 112419064 A CN112419064 A CN 112419064A CN 202011420188 A CN202011420188 A CN 202011420188A CN 112419064 A CN112419064 A CN 112419064A
- Authority
- CN
- China
- Prior art keywords
- matrix
- transaction
- energy
- network
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000002787 reinforcement Effects 0.000 title claims abstract description 44
- 239000011159 matrix material Substances 0.000 claims abstract description 202
- 238000012549 training Methods 0.000 claims abstract description 80
- 230000009471 action Effects 0.000 claims abstract description 60
- 238000003062 neural network model Methods 0.000 claims abstract description 60
- 239000013598 vector Substances 0.000 claims abstract description 39
- 238000013528 artificial neural network Methods 0.000 claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 22
- 230000006870 function Effects 0.000 claims description 59
- 230000014509 gene expression Effects 0.000 claims description 32
- 238000011156 evaluation Methods 0.000 claims description 28
- 238000012546 transfer Methods 0.000 claims description 18
- 150000001875 compounds Chemical class 0.000 claims description 14
- 238000004422 calculation algorithm Methods 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 11
- 238000009826 distribution Methods 0.000 claims description 10
- 230000001502 supplementing effect Effects 0.000 claims description 6
- 230000002159 abnormal effect Effects 0.000 claims description 4
- 238000005520 cutting process Methods 0.000 claims description 3
- 230000005611 electricity Effects 0.000 abstract description 19
- 230000007774 longterm Effects 0.000 abstract description 12
- 230000008901 benefit Effects 0.000 abstract description 7
- 230000004913 activation Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 238000007599 discharging Methods 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000016273 neuron death Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/04—Payment circuits
- G06Q20/06—Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme
- G06Q20/065—Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme using e-cash
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/08—Payment architectures
- G06Q20/10—Payment architectures specially adapted for electronic funds transfer [EFT] systems; specially adapted for home banking systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Health & Medical Sciences (AREA)
- Finance (AREA)
- General Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Economics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Marketing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Development Economics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computer Security & Cryptography (AREA)
- Biomedical Technology (AREA)
- Tourism & Hospitality (AREA)
- Bioethics (AREA)
- Computer Hardware Design (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- Water Supply & Treatment (AREA)
- Technology Law (AREA)
- Public Health (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to an energy transaction method, a device and equipment based on deep reinforcement learning and alliance chain, the first state matrix is formed by collecting N state vectors affecting buyers and sellers in the energy trading field, processing and analyzing the state matrix in the neural network model to obtain an action matrix, a second state matrix and an incentive matrix, training the neural network model by adopting the first state matrix, the action matrix, the second state matrix and the incentive matrix to obtain a neural network training model, and maximizing the long-term income of the electric vehicle participating in the transaction in the P2P electric quantity transaction of the electric vehicle applied on the basis of the neural network training model and the energy transaction method of the alliance chain, and an alliance chain is introduced, so that the privacy and safety of electric vehicle electricity quantity transaction are ensured, and the technical problem of how to enable a buyer and a seller to obtain the maximum long-term benefit in P2P electricity quantity transaction based on the alliance chain is solved.
Description
Technical Field
The invention relates to the technical field of vehicle networking, in particular to an energy transaction method, device and equipment based on deep reinforcement learning and an alliance chain.
Background
With the continuous development of automobile electromotion, new force of automobile construction in China is prominent, a traditional automobile enterprise is transferred to new energy, and the acceptance of users on electric automobiles is gradually increased. However, the increase in the number of electric vehicles presents relatively few challenges to the charging problem of large-scale electric vehicles. Firstly, the problem of charging station shortage of the electric automobile is solved; in addition, the existing electric automobiles are generally charged at night, and the problems of overlarge power loss, voltage drop and overload of a power supply grid are easily caused by excessive automobiles charged at the same time period.
In order to solve the above problems, a mechanism for point-to-point (P2P) electric quantity transaction between electric vehicles is proposed. In the P2P electricity quantity transaction, the participating electric vehicles are regarded as "production and sales persons", and the electricity quantity required by other electric vehicles can be directly purchased or surplus electricity quantity can be sold according to self conditions, and after the buyer and seller negotiate and agree with transaction prices, the electricity quantity transfer is realized through the smart grid. The P2P electric quantity transaction mechanism not only can relieve the electric network load of the electricity consumption peak, but also can reduce the cost of the electricity purchasing automobile and increase the income of the electricity selling automobile. However, the P2P power trading is simultaneously exposed to privacy disclosure of the participants. Because the blockchain is a distributed shared account book and database, the blockchain has the characteristics of decentralization, openness, independence, safety, anonymity and the like. Therefore, the P2P electricity trading based on the blockchain is now in the world, which not only can solve the problem of asymmetry of information between the buyer and the seller, create a trustable trading environment for the participants, but also allows the participants to participate in the trading anonymously, thereby protecting the privacy of the trader to the maximum extent. However, the safety of blockchains builds on excessive computational complexity, while electric vehicles generally do not have sufficient computational power, and many researchers choose efficient, scalable alliance chains to replace. The alliance chain is a special blockchain, and is different from a consensus mechanism that each node in the blockchain is a bookkeeper, only a plurality of preselected authoritative nodes in the alliance chain serve as the bookkeepers, and local transaction records are collected and managed to be completed at moderate cost; while other access nodes may participate in the transaction without having to ask for a billing process, without requiring a consensus mechanism that consumes significant computing power and additional time to complete.
Currently, only the current income of the charging automobile or a parking lot is discussed to be maximized by the alliance-chain-based P2P electric quantity transaction mechanism of the charging automobile, and the long-term income of the charging automobile is not taken into consideration. In real life, the number of cars in the parking lot is dynamically changed, and if a seller car sells the surplus electric quantity at any moment, the seller car is likely to lose the opportunity of trading with the car with higher bid price at any moment, so that higher profit cannot be obtained. Therefore, in the P2P electricity quantity transaction based on the alliance chain, how to provide the buyer and seller matching strategy with the largest long-term income for the charging automobile is a problem to be solved urgently.
Disclosure of Invention
The embodiment of the invention provides an energy transaction method, device and equipment based on deep reinforcement learning and a alliance chain, which are used for solving the technical problem of how to enable a buyer and a seller to obtain maximum long-term benefit in P2P electricity transaction based on the alliance chain.
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:
an energy transaction method based on deep reinforcement learning and alliance chain is applied to electric quantity transaction of an electric vehicle and comprises the following steps:
s10, collecting transaction characteristics of an energy transaction field, forming the transaction characteristics into a state vector, and forming a first state matrix by N state vectors in the energy transaction field at the moment t; the transaction characteristics comprise the time of the electric automobile remaining stopped in an energy transaction field, a trading label, transaction energy and a transaction price;
s20, inputting the first state matrix into a deep reinforcement learning neural network model, and outputting an action matrix;
s30, calculating the action matrix and the first state matrix through a state transfer function and a reward function to obtain a second state matrix and a reward matrix at the moment of t + 1; the first state matrix, the action matrix, the second state matrix and the reward matrix form a training matrix, and the training matrix is stored in a playback pool of the neural network model;
s40, acquiring m pieces of data of the training matrix from a playback pool of the neural network model every delta t moment to train the neural network model until a loss function of the neural network model converges or iterates to the maximum number of times, and obtaining a trained neural network training model;
and S50, in the energy trading place, inputting a state matrix formed by the trading features required by the buyer and the seller into the neural network training model to obtain the trading energy of the buyer and the seller.
Preferably, before storing the training matrix into the replay pool of the neural network model, the method further includes: processing abnormal values of the first state matrix, the action matrix and the second state matrix;
deleting the numerical values of the first state matrix and the second state matrix which are not in the preset value range, and supplementing 0;
and deleting the element values in the action matrix, which meet the condition that the price of the buyer is less than that of the seller, and supplementing 0.
Preferably, the outputting the action matrix specifically includes: cutting a vector output by the neural network model into N vectors, each vector comprising N elements, the N vectors constituting an N x N of the motion matrices; wherein each vector is the energy of one electric vehicle and the electric vehicle in the N-1 vehicles for trading.
Preferably, the state transfer function f (S)t,At) The expression of (a) is:
in the formula (I), the compound is shown in the specification,is a second state matrix of the electric vehicle i,respectively indicating the remaining parking time, the transaction electric quantity, the transaction price and the buying and selling label of the electric automobile i at the moment t + 1;
the expression of the transaction electric quantity required by the electric automobile i at the moment t +1 is as follows:
in the formula (I), the compound is shown in the specification,the transaction power amount required by the electric automobile i at the moment t,for the buying and selling label of the electric automobile i at the time t,energy purchased by the electric automobile i to the electric automobile j at the moment t for the elements in the action matrix;
in the formula (I), the compound is shown in the specification,respectively, the mean and variance of the normal distribution satisfied by the variable x;
wherein whenThen, the remaining parking time and the buying and selling label expression of the electric automobile i at the moment t +1 are as follows:
in the formula (I), the compound is shown in the specification,the rest parking time of the electric automobile i at the moment t is obtained;
when in useThen, the expressions of the remaining parking time, the transaction electric quantity and the buying and selling label of the electric automobile i at the moment t +1 are as follows:
in the formula, mu2Andrespectively the stay time of the electric automobile in the energy trading fieldMean and variance, μ, of the satisfied normal distribution3Andis the energy that the electric automobile needs to tradeThe mean and variance of the satisfied normal distribution.
Preferably, the reward matrix RtThe expression of (a) is:
in the formula (I), the compound is shown in the specification,is the reward of the electric automobile i at the moment t, k1,k2,k3,k4Is a constant number stIs as follows The penalty factor at the time t is given,the energy purchased by the electric vehicle i to the electric vehicle j at time t for the element in the action matrix,the energy required by the electric vehicle i at time t,pricing energy for electric car i at time t,the value of the buying and selling label of the electric automobile i at the moment t.
Preferably, the training of the neural network model specifically includes: iteratively updating parameters of a Critic network and an Actor network until a loss function converges or iterates to the maximum number of times, wherein the Critic network comprises a Critic evaluation network and a Critic target network, and the Actor network comprises an Actor evaluation network and an Actor target network;
wherein the loss function of the criticic network is as follows:
the loss function of the Actor network is as follows:
in the formula, L1Loss of the Actor network; l is2Is the loss of the Critic network; gamma is the discount coefficient, qkThe output of the Critic evaluation network is used for representing the Q value corresponding to the sample k; q's'kIs the output of the Critic target network and represents the Q value corresponding to the next moment of the sample k; k is equal to {1,2, …, m }, qkAnd q'kThe mathematical expression of (a) is:
qk=ReLU(W2Sk+W3Ak+b2)
q′k=ReLU(W′2Sk+1+W′3μ′(Sk+1)+b′2)
W2and W3Weight matrices for the output layers of the network, both Critic evaluation values, b2Estimating network output layer bias for CriticIs location vector, W'2And W'3Are weight matrix of Critic target network output layer, b'2Is the offset vector, μ' (S) of the output layer of the Critic target networkk+1) Is to mix Sk+1The output obtained by the target network of the Actor is input, and the representation state is Sk+1A corresponding optimal action matrix;
the back propagation algorithm expression for iteratively updating the parameters of the Actor valuation network and the criticic valuation network is as follows:
the soft update algorithm expression for iteratively updating the parameters of the Actor valuation network and the criticic valuation network is as follows:
W′←τW+(1-τ)W′
wherein W is the parameter W of Critic evaluation network and Actor evaluation network2,W3,b2,For two loss functions L1,L2The gradient of (a) is that W 'represents parameters of a Critic target network and an Actor target network, alpha is a training factor and has a value range of [0,1 ], and tau is a coefficient for controlling the influence of an old parameter W' of the target network and a parameter W of an evaluation network on the target network.
Preferably, in the energy trading floor, obtaining the matching transaction between the buyer and the seller specifically further includes: at a certain moment, a state matrix formed by transaction characteristics of buyers and sellers needing to be traded is input into the neural network training model, and the neural network training model outputs an action matrix of the trading.
The invention also provides an energy transaction device based on deep reinforcement learning and alliance chain, which comprises a data acquisition module, a first processing module, a second processing module, a training module and an output module;
the data acquisition module is used for acquiring transaction characteristics of the energy transaction field and forming the transaction characteristics into a state vector, and at the moment t, N state vectors in the energy transaction field form a first state matrix; the transaction characteristics comprise the time of the electric automobile remaining stopped in an energy transaction field, a trading label, transaction energy and a transaction price;
the first processing module is used for inputting the first state matrix into a deep reinforcement learning neural network model and outputting an action matrix;
the second processing module is used for calculating the action matrix and the first state matrix through a state transfer function and a reward function to obtain a second state matrix and a reward matrix at the moment of t + 1; the first state matrix, the action matrix, the second state matrix and the reward matrix form a training matrix, and the training matrix is stored in a playback pool of the neural network model;
the training module is used for acquiring m pieces of data of the training matrix from a playback pool of the neural network model every delta t moment to train the neural network model until a loss function of the neural network model converges or iterates to the maximum times, so as to obtain a trained neural network training model;
and the output module is used for inputting a state matrix formed by transaction characteristics required by the buyer and the seller in the energy trading field into the neural network training model to obtain the energy of the transaction of the buyer and the seller.
The present invention also provides a computer-readable storage medium for storing computer instructions that, when executed on a computer, cause the computer to perform the deep reinforcement learning and federation chain based energy trading method described above.
The invention also provides terminal equipment, which comprises a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is used for executing the energy trading method based on the deep reinforcement learning and the alliance chain according to instructions in the program code.
According to the technical scheme, the embodiment of the invention has the following advantages: the energy trading method, the device and the equipment based on the deep reinforcement learning and the alliance chain form a first state matrix by collecting N state vectors influencing buyers and sellers in an energy trading field, processing and analyzing the state matrix in the neural network model to obtain an action matrix, a second state matrix and an incentive matrix, training the neural network model by adopting the first state matrix, the action matrix, the second state matrix and the incentive matrix to obtain a neural network training model, and maximizing the long-term income of the electric vehicle participating in the transaction in the P2P electric quantity transaction of the electric vehicle applied on the basis of the neural network training model and the energy transaction method of the alliance chain, and an alliance chain is introduced, so that the privacy and safety of electric vehicle electricity quantity transaction are ensured, and the technical problem of how to enable a buyer and a seller to obtain the maximum long-term benefit in P2P electricity quantity transaction based on the alliance chain is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a flowchart illustrating steps of a deep reinforcement learning and federation chain-based energy transaction method according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a neural network model of an energy trading method based on deep reinforcement learning and a federation chain according to an embodiment of the present invention.
Fig. 3 is a framework diagram of an energy trading method federation chain based on deep reinforcement learning and federation chain according to an embodiment of the present invention.
Fig. 4 is a block diagram of an energy transaction apparatus based on deep reinforcement learning and federation chain according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the application provides an energy transaction method, device and equipment based on deep reinforcement learning and an alliance chain, and the method, device and equipment are used for solving the technical problem of how to enable a buyer and a seller to obtain maximum long-term benefit in P2P electricity transaction based on the alliance chain. In this embodiment, a parking lot is used as an energy transaction case, and the energy of the transaction is the electric quantity of the electric vehicle transaction in the parking lot.
The first embodiment is as follows:
fig. 1 is a flowchart illustrating steps of an energy transaction method based on deep reinforcement learning and a federation chain according to an embodiment of the present invention, and fig. 2 is a schematic structural diagram illustrating a neural network model of the energy transaction method based on deep reinforcement learning and a federation chain according to an embodiment of the present invention.
As shown in fig. 1 and fig. 2, an embodiment of the present invention provides an energy transaction method based on deep reinforcement learning and a federation chain, including the following steps:
s10, collecting transaction characteristics of an energy transaction field, forming the transaction characteristics into a state vector, and forming a first state matrix by N state vectors in the energy transaction field at the moment t; the transaction characteristics comprise the time of the electric automobile remaining in the energy transaction field, a trading label, transaction energy and a transaction price;
s20, inputting the first state matrix into a deep reinforcement learning neural network model, and outputting an action matrix;
s30, calculating the action matrix and the first state matrix through a state transfer function and a reward function to obtain a second state matrix and a reward matrix at the t +1 moment; the first state matrix, the action matrix, the second state matrix and the reward matrix form a training matrix, and the training matrix is stored in a playback pool of the neural network model;
s40, acquiring data of m training matrixes from a playback pool of the neural network model every delta t moment to train the neural network model until a loss function of the neural network model converges or iterates to the maximum number of times, and obtaining a trained neural network training model;
and S50, in an energy trading field, inputting a state matrix formed by the trading features required by the buyer and the seller into a neural network training model to obtain the trading energy of the buyer and the seller.
In step S10 of the embodiment of the present invention, characteristics of the electric vehicle in the parking lot are mainly collected, including remaining parking time δ of the electric vehicle, a marketing label z (z values corresponding to a buyer and a seller are-1 and +1, respectively), electric quantity e for performing a transaction, and a transaction price p; the characteristics are combined into a state vector of one electric vehicle, and at the time t, the state vectors of all N electric vehicles in the parking lot form a first state matrix StI.e. by
It should be noted that, in step S10, the factors affecting the long-term profit of the two parties in the energy trading floor are mainly obtained.
In steps S20 to S40 of the embodiment of the present invention, the features in the state matrix are processed and learned in the neural network model to obtain the action matrix, and the action matrix and the first state matrix S are further processed and learnedtAfter the processing of the state transition function and the reward function, the state matrix (namely, the second state matrix) and the reward matrix R at the moment of t +1 are obtainedtAnd the first state matrix StMotion matrix AtA second state matrix St+1And a reward matrix RtForm a training matrix (S)t,At,St+1,Rt) As training data for training a neural network model of an energy trading place, a neural network training model is obtained through training the neural network model, and the neural network training model can learn the parking lotThe dynamic change of the state (energy trading field) and the optimal action matrix adapting to the change enable the long-term income of trading electric vehicles under the dynamic change to reach the maximum value.
It should be noted that, as shown in fig. 2, the neural network model includes a criticic estimation network, a criticic target network, an Actor estimation network, and an Actor target network. The Critic evaluation network and the Critic target network have the same structure and are mainly used for calculating the Q values of states and actions, and the Actor evaluation network and the Actor target network have the same structure and are mainly responsible for selecting the action with the highest Q value according to the states and outputting an action matrix. Inputting the state data obtained by the neural network model into an Actor estimation network, and outputting the action by the Actor estimation network; the action output by the Actor valuation network interacts with the environment to obtain next state data and reward data; storing the data; and if the data volume is enough, randomly sampling the data, and updating network parameters of the Critic evaluation network, the Critic target network, the Actor evaluation network and the Actor target network. When the Critic evaluation network and the Actor evaluation network are updated, a gradient descent algorithm is used; when the Critic target network and the Actor target network are updated, a soft update algorithm is used.
In step S40 of the embodiment of the present invention, step 10 to step 30 are repeated to collect a sufficient number of training data to train the neural network model, so as to improve the accuracy of the result output by the neural network training model.
In step S50 of the embodiment of the present invention, the obtained neural network training model is mainly combined with the alliance chain, the state matrix that the buyer and the seller need to trade is input into the neural network training model, the neural network training model outputs the energy of the business transaction of the buyer and the seller, the buyer and the seller perform the energy transaction according to the energy and the price, and then the buyer and the seller perform the money transaction in the alliance chain, so that the privacy of the business transaction of the buyer and the seller is ensured.
The electric automobile outputs the energy of business and buyer transaction to transmit corresponding electric quantity to the charging/discharging pile of the parking lot according to the neural network training model, and the electric quantity transfer of the transaction is completed; the method comprises the steps that a buyer electric automobile conducts currency transaction to a seller automobile on an alliance chain, and the seller automobile obtains the income of transaction electric quantity and then signs and confirms to complete the transaction; at intervals, different parking lots pack the transaction records of the interval and gather the transaction records into blocks, the blocks are used as bookkeepers in turn, and the blocks are added to the tail end of a block chain after signature confirmation of other parking lots. If the state of the charging automobile in the parking lot changes at the moment of t +1, the neural network training model acquires new state data of the state matrix, and a new round of electric automobile electric quantity transaction is started.
The invention provides an energy trading method based on deep reinforcement learning and alliance chain, which forms a first state matrix by collecting N state vectors influencing buyers and sellers in an energy trading field, processing and analyzing the state matrix in the neural network model to obtain an action matrix, a second state matrix and an incentive matrix, training the neural network model by adopting the first state matrix, the action matrix, the second state matrix and the incentive matrix to obtain a neural network training model, and maximizing the long-term income of the electric vehicle participating in the transaction in the P2P electric quantity transaction of the electric vehicle applied on the basis of the neural network training model and the energy transaction method of the alliance chain, and an alliance chain is introduced, so that the privacy and safety of electric vehicle electricity quantity transaction are ensured, and the technical problem of how to enable a buyer and a seller to obtain the maximum long-term benefit in P2P electricity quantity transaction based on the alliance chain is solved.
In an embodiment of the present invention, before storing the training matrix into the replay pool of the neural network model, the method further includes: for the first state matrix StMotion matrix AtA second state matrix St+1Processing abnormal values;
wherein, for the first state matrix StAnd a second state matrix St+1Deleting the numerical values which are not in the preset value range, and supplementing 0;
to action matrix AtAnd deleting the element value meeting the condition that the price of the buyer is less than that of the seller, and supplementing 0.
Note that the matrix (S) is to be trainedt,At,St+1,Rt) Loop stored in neural network modelBefore the pool is placed, abnormal value processing is carried out, including deleting the state matrix StAnd St+1If the value is not in the preset range, the value is supplemented with 0, if the value z in the state matrix is not-1 and 1, or if the value p in the state matrix is not in the preset price range required by the buyer, the value in the state matrix is deleted and then replaced with 0. Further comprises deleting action matrix AtIs not satisfied withValue of (2)(The electric vehicle i purchases the price of energy at time t for the purchaser,the amount of power purchased by the electric vehicle i to the electric vehicle j at time t for the element in the action matrix) and 0 is complemented.
In one embodiment of the invention, the action matrix A is outputtThe method specifically comprises the following steps: cutting a vector a output by the neural network model into N vectors, wherein each vector comprises N elements, and the N vectors form an N multiplied by N action matrix At(ii) a Wherein each vectorIs the energy that one electric vehicle i trades with the electric vehicle j in the N-1 vehicles.
The action matrix atThe expression output by an Actor network of a neural network model is as follows:
wherein alpha ishIs the output of the h-th hidden layer of the Actor network, and is also the output of the h-th hidden layer of the Actor networkThe input of the h +1 th hidden layer,andweight matrix and bias, Act, for the h +1 th hidden layer, respectively1(. is) the activation function of the Actor network hidden layer. And the activation function Act of the hidden layer of the Actor network1(. about) use of ReLU function, Act1The mathematical expression of (x) is:
where x is a function variable. The ReLU function is a hidden layer activation function in the neural network model, and compared with other types of activation functions such as sigmoid of the neural network model, the ReLU function only needs to activate a few neurons each time, so that sparsity of the neural network is guaranteed, calculation is efficient, network weight convergence speed is increased, and the problem of gradient disappearance can be solved; if the neural network model has the problem of neuron death, the ReLU function can be replaced by a similar LeakyReLU function, and the mathematical expression is as follows:
where C is a constant.
The expression of the output layer in the Actor network is:
where H is the total number of hidden layers, αHThe output of the last hidden layer, and at the same time the input of the output layer,andweight matrix and bias, Act, respectively, of the output layer2(. is) the activation function of the Actor network output layer. While activating the function Act2(. one) uses the Softmax function, the mathematical expression is:
wherein xiIs the ith node of the network output layer of the input Actor. Compared with other activation functions of a neural network model, the sum of the final obtained output values of the Softmax function is 1, and the energy transaction method based on the deep reinforcement learning and the alliance chain can be directly regarded as the percentage of the electric quantity purchased by the buyer electric vehicle to all other seller electric vehicles to the total required electric quantity, no additional calculation is needed, and the efficiency is high.
In one embodiment of the invention, the state transfer function f (S)t,At) The expression of (a) is:
in the formula (I), the compound is shown in the specification,is a second state matrix of the electric vehicle i,respectively indicating the remaining parking time, the transaction electric quantity, the transaction price and the buying and selling label of the electric automobile i at the moment t + 1;
the expression of the transaction electric quantity required by the electric automobile i at the moment t +1 is as follows:
in the formula (I), the compound is shown in the specification,the transaction power amount required by the electric automobile i at the moment t,for the buying and selling label of the electric automobile i at the time t,energy purchased by the electric automobile i to the electric automobile j at the moment t for the elements in the action matrix;
in the formula (I), the compound is shown in the specification,respectively, the mean and variance of the normal distribution satisfied by the variable x;
wherein whenThen, the remaining parking time and the buying and selling label expression of the electric automobile i at the moment t +1 are as follows:
in the formula (I), the compound is shown in the specification,the rest parking time of the electric automobile i at the moment t is obtained;
when in useThen, the expressions of the remaining parking time, the transaction electric quantity and the buying and selling label of the electric automobile i at the moment t +1 are as follows:
in the formula, mu2Andrespectively the stay time of the electric automobile in the energy trading fieldMean and variance, μ, of the satisfied normal distribution3Andis the energy that the electric automobile needs to tradeThe mean and variance of the satisfied normal distribution.
In addition, the state matrix StTo the state matrix St+1The remaining dwell time δ of the features in the state matrix and the amount of electricity e to trade can be determined by the action matrix A taken when transitioningtDetermine to buyThe selling label z is related to the residual electric quantity of the automobile, and the trading price p of the electric quantity can fluctuate randomly. In particular, when the state matrix StTo the state matrix St+1During transfer, the relation between the remaining stop time delta at the characteristic moment t +1 in the state matrix, the remaining stop time delta at the moment e for carrying out transaction and the t and the e for carrying out transaction is embodied in a state transfer function, and the transaction price p of the electric quantity is represented according to an uncertain state transfer function f in the state transfer function (S)t,At) Determining, not determining, the state transfer function f (S)t,At) The method comprises the steps that whether the buying and selling label z changes or not, the trading price p of the electric quantity fluctuates along with the change of time, and the leaving of the electric automobile and the adding of a new electric automobile after the electric quantity trading are calculated in different modes. For example: for electric vehicles with incomplete transactions, i.e.For electric vehicles which have completed a transaction, i.e. areThe variable x is a random variable whose N state vectors satisfy a normal distribution. U (0, 1) represents that the variable x satisfies a uniform distribution within the interval (0, 1), i.e., x is a random number within the interval (0, 1). For example: at time t + 1, the probability that the electric vehicle becomes the electricity buyer or the seller is 0.5.
In one embodiment of the invention, the reward matrix RtThe expression of (a) is:
in the formula (I), the compound is shown in the specification,is the reward of the electric automobile i at the moment t, k1,k2,k3,k4Is a constant number stIs as followsThe penalty factor at the time t is given,the energy purchased by the electric vehicle i to the electric vehicle j at time t for the element in the action matrix,the energy required by the electric vehicle i at time t,pricing energy for electric car i at time t,the value of the buying and selling label of the electric automobile i at the moment t.
In an embodiment of the present invention, the training of the neural network model specifically includes: iteratively updating parameters of a Critic network and an Actor network until a loss function converges or iterates to the maximum number of times, wherein the Critic network comprises a Critic valuation network and a Critic target network, and the Actor network comprises an Actor valuation network and an Actor target network;
wherein, the loss function of the criticic network is as follows:
the loss function for an Actor network is:
in the formula, L1Loss of the Actor network; l is2Is the loss of the Critic network; gamma is the discount coefficient, qkThe output of the Critic evaluation network is used for representing the Q value corresponding to the sample k; q's'kIs the output of the Critic target network and represents the Q value corresponding to the next moment of the sample k; k is equal to {1,2, …, m }, qkAnd q'kThe mathematical expression of (a) is:
qk=ReLU(W2Sk+W3Ak+b2)
q′k=ReLU(W′2Sk+1+W′3μ′(Sk+1)+b′2)
W2and W3Weight matrices for the output layers of the network, both Critic evaluation values, b2Is the offset vector, W ', of the output layer of the Critic estimation network'2And W'3Are weight matrix of Critic target network output layer, b'2Is the offset vector, μ' (S) of the output layer of the Critic target networkk+1) Is to mix Sk+1The output obtained by the target network of the Actor is input, and the representation state is Sk+1A corresponding optimal action matrix;
the back propagation algorithm expression for iteratively updating the parameters of the Actor valuation network and the criticic valuation network is as follows:
the soft update algorithm expression for iteratively updating the parameters of the Actor valuation network and the criticic valuation network is as follows:
W′←τW+(1-τ)W′
wherein W is the parameter W of Critic evaluation network and Actor evaluation network2,W3,b2,For two loss functions L1,L2The gradient of (a) is that W 'represents parameters of a Critic target network and an Actor target network, alpha is a training factor and has a value range of [0,1 ], and tau is a coefficient for controlling the influence of an old parameter W' of the target network and a parameter W of an evaluation network on the target network.
M is a natural number. Loss function in Critic networksL2In this embodiment, the value of γ may be 0.95, 0.98, or 0.999, and the discount coefficient is mainly used to control the influence of the Q value at the time t +1 on the action at the current time t, where γ is 0.95 in this embodiment.
In the embodiment of the invention, parameters of an Actor estimation network and a criticic estimation network are updated by using a back propagation algorithm, wherein the value of alpha can be 0.5, 0.1 or 0.01 and the like, a training factor is mainly used for controlling the training speed and effect of a neural network model, and the training factor alpha is preferably 0.01.
In the embodiment of the invention, the parameters of the Critic evaluation network and the Actor evaluation network are updated k times each time, the soft update algorithm is used for updating the parameters of the Critic target network and the Actor target network, compared with other DRL algorithms, the soft update algorithm directly copies the network parameters of the evaluation network to the target network, and combines the old target network parameters and the new evaluation network parameters, so that a more accurate neural network training model is obtained through training.
Fig. 3 is a framework diagram of an energy trading method federation chain based on deep reinforcement learning and federation chain according to an embodiment of the present invention.
In an embodiment of the present invention, in the energy trading field, obtaining the matching transaction between the buyer and the seller specifically further includes: at a certain moment, a state matrix formed by transaction characteristics of buyers and sellers needing to be traded is input into a neural network training model, and the neural network training model outputs an action matrix of the trading.
It should be noted that, in the energy transaction method based on deep reinforcement learning and the federation chain, in order to ensure privacy and security of the transaction and ensure that the transaction result is public and transparent and is not falsifiable, the final transaction is realized through the federation chain. As shown in fig. 3, the alliance chain model is composed of an electric car and an energy trading ground (i.e., parking ground). After the electric automobile enters the parking lot, initiating a registration application to an energy trading place, and joining an alliance chain; at each moment, the electric automobile uploads the state of the electric automobile to an energy trading field, the energy trading field collects the states of all the charged automobiles and inputs the states into a neural network training model to obtain the optimal action, namely the pairing of a buyer and a seller and the specific trading electric quantity, and then the result and the corresponding electronic wallet address are returned to the electric automobiles of the two parties of the trading; the transaction electric automobile transmits the electric quantity to the charging/discharging pile in the parking lot according to the obtained matching result; at the next moment, the buyer electric automobile transfers accounts to the electronic purse of the seller electric automobile, and the seller electric automobile obtains the actually sold electric quantity cost from the electronic purse of the seller electric automobile; and finally, at intervals, the energy trading fields of different parking lots pack the trading records at the interval and gather the trading records into a block of the alliance chain, the blocks are alternately used as the accountants, and after signature confirmation of other energy trading fields, the block is added to the tail end of the block chain of the alliance chain. Wherein the energy trading floor is also referred to as an intermediary.
In the embodiment of the invention, the energy trading method based on deep reinforcement learning and alliance chain processes the change of the electric vehicle P2P electric quantity trading scene by adopting the state, action, state transfer function and reward function of a deep reinforcement learning neural network model, so that the state matrix S in the obtained neural network training modeltMotion matrix AtReward matrix RtThe method conforms to the P2P power transaction of the charging electric automobile in the parking lot.
Example two:
fig. 4 is a block diagram of an energy transaction apparatus based on deep reinforcement learning and federation chain according to an embodiment of the present invention.
As shown in fig. 4, an embodiment of the present invention further provides an energy trading device based on deep reinforcement learning and league chain, which includes a data acquisition module 10, a first processing module 20, a second processing module 30, a training module 40, and an output module 50:
the data acquisition module 10 is used for acquiring transaction characteristics of the energy transaction field, forming the transaction characteristics into a state vector, and forming a first state matrix by N state vectors in the energy transaction field at the moment t; the transaction characteristics comprise the time of the electric automobile remaining in the energy transaction field, a trading label, transaction energy and a transaction price;
the first processing module 20 is used for inputting the first state matrix into the deep reinforcement learning neural network model and outputting an action matrix;
the second processing module 30 is configured to calculate the action matrix and the first state matrix through a state transfer function and a reward function to obtain a second state matrix and a reward matrix at a time t + 1; the first state matrix, the action matrix, the second state matrix and the reward matrix form a training matrix, and the training matrix is stored in a playback pool of the neural network model;
the training module 40 is configured to acquire m pieces of data of the training matrix from the playback pool of the neural network model every Δ t time to train the neural network model until a loss function of the neural network model converges or iterates to the maximum number of times, so as to obtain a trained neural network training model;
and the output module 50 is used for inputting the state matrix formed by the transaction characteristics required by the buyer and the seller in the energy trading field into the neural network training model to obtain the energy traded by the buyer and the seller.
It should be noted that, the modules in the apparatus according to the second embodiment correspond to the steps in the method according to the first embodiment, the steps of the method have been described in detail in the first embodiment, and the contents of the modules are not described in detail in the second embodiment.
Example three:
embodiments of the present invention provide a computer-readable storage medium for storing computer instructions that, when executed on a computer, cause the computer to perform the above-described deep reinforcement learning and federation chain-based energy trading method.
Example four:
the embodiment of the invention provides terminal equipment, which comprises a processor and a memory;
a memory for storing the program code and transmitting the program code to the processor;
and the processor is used for executing the energy trading method based on the deep reinforcement learning and the alliance chain according to the instructions in the program codes.
It should be noted that the processor is configured to execute the steps in one of the above-described embodiments of the energy trading device based on deep reinforcement learning and federation chains according to the instructions in the program code. Alternatively, the processor, when executing the computer program, implements the functions of each module/unit in each system/apparatus embodiment described above.
Illustratively, a computer program may be partitioned into one or more modules/units, which are stored in a memory and executed by a processor to accomplish the present application. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of a computer program in a terminal device.
The terminal device may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the terminal device is not limited and may include more or fewer components than those shown, or some components may be combined, or different components, e.g., the terminal device may also include input output devices, network access devices, buses, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage may be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory may also be an external storage device of the terminal device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal device. Further, the memory may also include both an internal storage unit of the terminal device and an external storage device. The memory is used for storing computer programs and other programs and data required by the terminal device. The memory may also be used to temporarily store data that has been output or is to be output.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. An energy transaction method based on deep reinforcement learning and alliance chain is applied to electric quantity transaction of an electric vehicle, and is characterized by comprising the following steps:
s10, collecting transaction characteristics of an energy transaction field, forming the transaction characteristics into a state vector, and forming a first state matrix by N state vectors in the energy transaction field at the moment t; the transaction characteristics comprise the time of the electric automobile remaining stopped in an energy transaction field, a trading label, transaction energy and a transaction price;
s20, inputting the first state matrix into a deep reinforcement learning neural network model, and outputting an action matrix;
s30, calculating the action matrix and the first state matrix through a state transfer function and a reward function to obtain a second state matrix and a reward matrix at the moment of t + 1; the first state matrix, the action matrix, the second state matrix and the reward matrix form a training matrix, and the training matrix is stored in a playback pool of the neural network model;
s40, acquiring m pieces of data of the training matrix from a playback pool of the neural network model every delta t moment to train the neural network model until a loss function of the neural network model converges or iterates to the maximum number of times, and obtaining a trained neural network training model;
and S50, in the energy trading place, inputting a state matrix formed by the trading features required by the buyer and the seller into the neural network training model to obtain the trading energy of the buyer and the seller.
2. The deep reinforcement learning and league chain based energy trading method of claim 1, wherein before storing the training matrix into a replay pool of the neural network model, further comprising: processing abnormal values of the first state matrix, the action matrix and the second state matrix;
deleting the numerical values of the first state matrix and the second state matrix which are not in the preset value range, and supplementing 0;
and deleting the element values in the action matrix, which meet the condition that the price of the buyer is less than that of the seller, and supplementing 0.
3. The deep reinforcement learning and federation chain-based energy trading method of claim 1, wherein outputting the action matrix specifically comprises: cutting a vector output by the neural network model into N vectors, each vector comprising N elements, the N vectors constituting an N x N of the motion matrices; wherein each vector is the energy of one electric vehicle and the electric vehicle in the N-1 vehicles for trading.
4. The deep reinforcement learning and federation chain-based energy trading method of claim 1, wherein the state transfer function f (S)t,At) The expression of (a) is:
in the formula (I), the compound is shown in the specification,is a second state matrix of the electric vehicle i,respectively indicating the remaining parking time, the transaction electric quantity, the transaction price and the buying and selling label of the electric automobile i at the moment t + 1;
the expression of the transaction electric quantity required by the electric automobile i at the moment t +1 is as follows:
in the formula (I), the compound is shown in the specification,the transaction power amount required by the electric automobile i at the moment t,for the buying and selling label of the electric automobile i at the time t,energy purchased by the electric automobile i to the electric automobile j at the moment t for the elements in the action matrix;
in the formula, mu1,Respectively, the mean and variance of the normal distribution satisfied by the variable x;
wherein whenThen, the remaining parking time and the buying and selling label expression of the electric automobile i at the moment t +1 are as follows:
in the formula (I), the compound is shown in the specification,the rest parking time of the electric automobile i at the moment t is obtained;
when in useThen, the expressions of the remaining parking time, the transaction electric quantity and the buying and selling label of the electric automobile i at the moment t +1 are as follows:
5. The deep reinforcement learning and alliance chain based energy trading method of claim 1 wherein the incentive matrix RtThe expression of (a) is:
in the formula (I), the compound is shown in the specification,is the reward of the electric automobile i at the moment t, k1,k2,k3,k4Is a constant number stIs as followsThe penalty factor at the time t is given,the energy purchased by the electric vehicle i to the electric vehicle j at time t for the element in the action matrix,the energy required by the electric vehicle i at time t,pricing energy for electric car i at time t,the value of the buying and selling label of the electric automobile i at the moment t.
6. The deep reinforcement learning and federation chain-based energy trading method of claim 1, wherein the training of the neural network model specifically comprises: iteratively updating parameters of a Critic network and an Actor network until a loss function converges or iterates to the maximum number of times, wherein the Critic network comprises a Critic evaluation network and a Critic target network, and the Actor network comprises an Actor evaluation network and an Actor target network;
wherein the loss function of the criticic network is as follows:
the loss function of the Actor network is as follows:
in the formula, L1Loss of the Actor network; l is2Is the loss of the Critic network; gamma is the discount coefficient, qkThe output of the Critic evaluation network is used for representing the Q value corresponding to the sample k; q's'kIs the output of the Critic target network and represents the Q value corresponding to the next moment of the sample k; k is equal to {1,2, …, m }, qkAnd q'kThe mathematical expression of (a) is:
qk=ReLU(W2Sk+W3Ak+b2)
q′k=ReLU(W′2Sk+1+W′3μ′(Sk+1)+b′2)
W2and W3Weight matrices for the output layers of the network, both Critic evaluation values, b2Is the offset vector, W ', of the output layer of the Critic estimation network'2And W'3Are weight matrix of Critic target network output layer, b'2Is the offset vector, μ' (S) of the output layer of the Critic target networkk+1) Is to mix Sk+1The output obtained by the target network of the Actor is input, and the representation state is Sk+1A corresponding optimal action matrix;
the back propagation algorithm expression for iteratively updating the parameters of the Actor valuation network and the criticic valuation network is as follows:
the soft update algorithm expression for iteratively updating the parameters of the Actor valuation network and the criticic valuation network is as follows:
W′←τW+(1-τ)W′
wherein W is the parameter W of Critic evaluation network and Actor evaluation network2,W3,b2,For two loss functions L1,L2W' represents the parameters of the Critic target network and the Actor target network, alpha is a training factor and has the value range of 0,1, and tau is controlAnd the old parameter W' of the target network and the coefficient of the influence of the parameter W of the estimation network on the target network.
7. The energy trading method based on deep reinforcement learning and alliance chain as claimed in claim 1 wherein obtaining matching buyer and seller trades in the energy trading floor further comprises: at a certain moment, a state matrix formed by transaction characteristics of buyers and sellers needing to be traded is input into the neural network training model, and the neural network training model outputs an action matrix of the trading.
8. The energy transaction device is characterized by comprising a data acquisition module, a first processing module, a second processing module, a training module and an output module, wherein the data acquisition module is used for acquiring data of a user, and the data acquisition module is used for acquiring the data of the user:
the data acquisition module is used for acquiring transaction characteristics of the energy transaction field and forming the transaction characteristics into a state vector, and at the moment t, N state vectors in the energy transaction field form a first state matrix; the transaction characteristics comprise the time of the electric automobile remaining stopped in an energy transaction field, a trading label, transaction energy and a transaction price;
the first processing module is used for inputting the first state matrix into a deep reinforcement learning neural network model and outputting an action matrix;
the second processing module is used for calculating the action matrix and the first state matrix through a state transfer function and a reward function to obtain a second state matrix and a reward matrix at the moment of t + 1; the first state matrix, the action matrix, the second state matrix and the reward matrix form a training matrix, and the training matrix is stored in a playback pool of the neural network model;
the training module is used for acquiring m pieces of data of the training matrix from a playback pool of the neural network model every delta t moment to train the neural network model until a loss function of the neural network model converges or iterates to the maximum times, so as to obtain a trained neural network training model;
and the output module is used for inputting a state matrix formed by transaction characteristics required by the buyer and the seller in the energy trading field into the neural network training model to obtain the energy of the transaction of the buyer and the seller.
9. A computer-readable storage medium for storing computer instructions which, when executed on a computer, cause the computer to perform the deep reinforcement learning and federation chain based energy trading method of any one of claims 1 to 7.
10. A terminal device comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor, configured to execute the method for energy trading based on deep reinforcement learning and federation chain according to any one of claims 1 to 7 according to instructions in the program code.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011420188.7A CN112419064B (en) | 2020-12-07 | 2020-12-07 | Energy transaction method, device and equipment based on deep reinforcement learning and alliance chain |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011420188.7A CN112419064B (en) | 2020-12-07 | 2020-12-07 | Energy transaction method, device and equipment based on deep reinforcement learning and alliance chain |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112419064A true CN112419064A (en) | 2021-02-26 |
CN112419064B CN112419064B (en) | 2022-02-08 |
Family
ID=74775865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011420188.7A Active CN112419064B (en) | 2020-12-07 | 2020-12-07 | Energy transaction method, device and equipment based on deep reinforcement learning and alliance chain |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112419064B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114202229A (en) * | 2021-12-20 | 2022-03-18 | 南方电网数字电网研究院有限公司 | Method and device for determining energy management strategy, computer equipment and storage medium |
US20230063075A1 (en) * | 2021-07-27 | 2023-03-02 | Tata Consultancy Services Limited | Method and system to generate pricing for charging electric vehicles |
CN117078347A (en) * | 2023-08-28 | 2023-11-17 | 合肥工业大学 | Electric-carbon integrated transaction method based on alliance chain |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423978A (en) * | 2017-06-16 | 2017-12-01 | 郑州大学 | A kind of distributed energy business confirmation method based on alliance's block chain |
CN108038545A (en) * | 2017-12-06 | 2018-05-15 | 湖北工业大学 | Fast learning algorithm based on Actor-Critic neutral net continuous controls |
CN108985940A (en) * | 2018-07-18 | 2018-12-11 | 国网能源研究院有限公司 | Power exchange management system and method between a kind of user based on block chain technology |
CN109003082A (en) * | 2018-07-24 | 2018-12-14 | 电子科技大学 | PHEV power exchange system and its method of commerce based on alliance's block chain |
CN109784926A (en) * | 2019-01-22 | 2019-05-21 | 华北电力大学(保定) | A kind of virtual plant internal market method of commerce and system based on alliance's block chain |
CN110349027A (en) * | 2019-07-19 | 2019-10-18 | 湘潭大学 | Pairs trade system based on deeply study |
CN110378693A (en) * | 2019-07-11 | 2019-10-25 | 合肥工业大学 | Distributed energy weak center trade managing system based on alliance's block chain |
CN110458443A (en) * | 2019-08-07 | 2019-11-15 | 南京邮电大学 | A kind of wisdom home energy management method and system based on deeply study |
US20200160411A1 (en) * | 2018-11-16 | 2020-05-21 | Mitsubishi Electric Research Laboratories, Inc. | Methods and Systems for Optimal Joint Bidding and Pricing of Load Serving Entity |
CN111815369A (en) * | 2020-07-31 | 2020-10-23 | 上海交通大学 | Multi-energy system energy scheduling method based on deep reinforcement learning |
-
2020
- 2020-12-07 CN CN202011420188.7A patent/CN112419064B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423978A (en) * | 2017-06-16 | 2017-12-01 | 郑州大学 | A kind of distributed energy business confirmation method based on alliance's block chain |
CN108038545A (en) * | 2017-12-06 | 2018-05-15 | 湖北工业大学 | Fast learning algorithm based on Actor-Critic neutral net continuous controls |
CN108985940A (en) * | 2018-07-18 | 2018-12-11 | 国网能源研究院有限公司 | Power exchange management system and method between a kind of user based on block chain technology |
CN109003082A (en) * | 2018-07-24 | 2018-12-14 | 电子科技大学 | PHEV power exchange system and its method of commerce based on alliance's block chain |
US20200160411A1 (en) * | 2018-11-16 | 2020-05-21 | Mitsubishi Electric Research Laboratories, Inc. | Methods and Systems for Optimal Joint Bidding and Pricing of Load Serving Entity |
CN109784926A (en) * | 2019-01-22 | 2019-05-21 | 华北电力大学(保定) | A kind of virtual plant internal market method of commerce and system based on alliance's block chain |
CN110378693A (en) * | 2019-07-11 | 2019-10-25 | 合肥工业大学 | Distributed energy weak center trade managing system based on alliance's block chain |
CN110349027A (en) * | 2019-07-19 | 2019-10-18 | 湘潭大学 | Pairs trade system based on deeply study |
CN110458443A (en) * | 2019-08-07 | 2019-11-15 | 南京邮电大学 | A kind of wisdom home energy management method and system based on deeply study |
CN111815369A (en) * | 2020-07-31 | 2020-10-23 | 上海交通大学 | Multi-energy system energy scheduling method based on deep reinforcement learning |
Non-Patent Citations (5)
Title |
---|
JIAWEN KANG ET AL.: "Enabling Localized Peer-to-Peer Electricity Trading Among Plug-in Hybrid Electric Vehicles Using Consortium Blockchains", 《IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS》 * |
TIMOTHY P. LILLICRAP ET AL.: "CONTINUOUS CONTROL WITH DEEP REINFORCEMENT", 《ARXIV》 * |
YANG LI ET AL.: "Deep Robust Reinforcement Learning for Practical Algorithmic Trading", 《IEEE ACCESS》 * |
刘建伟等: "基于值函数和策略梯度的深度强化学习综述", 《计算机学报》 * |
齐岳等: "基于深度强化学习DDPG算法的投资组合管理", 《计算机与现代化》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20230063075A1 (en) * | 2021-07-27 | 2023-03-02 | Tata Consultancy Services Limited | Method and system to generate pricing for charging electric vehicles |
CN114202229A (en) * | 2021-12-20 | 2022-03-18 | 南方电网数字电网研究院有限公司 | Method and device for determining energy management strategy, computer equipment and storage medium |
CN114202229B (en) * | 2021-12-20 | 2023-06-30 | 南方电网数字电网研究院有限公司 | Determining method of energy management strategy of micro-grid based on deep reinforcement learning |
CN117078347A (en) * | 2023-08-28 | 2023-11-17 | 合肥工业大学 | Electric-carbon integrated transaction method based on alliance chain |
Also Published As
Publication number | Publication date |
---|---|
CN112419064B (en) | 2022-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112419064B (en) | Energy transaction method, device and equipment based on deep reinforcement learning and alliance chain | |
CN109034915B (en) | Artificial intelligent electronic commerce system capable of using digital assets or points as transaction media | |
Bergemann et al. | Sequential information disclosure in auctions | |
CA3177410A1 (en) | Market orchestration system for facilitating electronic marketplace transactions | |
Qiu et al. | Multi-Agent Reinforcement Learning for Automated Peer-to-Peer Energy Trading in Double-Side Auction Market. | |
Yassine et al. | Double auction mechanisms for dynamic autonomous electric vehicles energy trading | |
Zhang et al. | EV charging bidding by multi-DQN reinforcement learning in electricity auction market | |
Backus et al. | Dynamic demand estimation in auction markets | |
Gong et al. | Split-award contracts with investment | |
Keniston | Bargaining and welfare: A dynamic structural analysis | |
Ray et al. | Supplier behavior modeling and winner determination using parallel MDP | |
Alsenani | The participation of electric vehicles in a peer-to-peer energy-backed token market | |
Carvalho | On a participation structure that ensures representative prices in prediction markets | |
KR20140100632A (en) | Resale system for repeating sale goods and method of the same | |
Clempner | A dynamic mechanism design for controllable and ergodic markov games | |
Fostel et al. | Endogenous leverage: VaR and beyond | |
CN110782338A (en) | Loan transaction risk prediction method and device, computer equipment and storage medium | |
Cheng et al. | Recent studies of agent incentives in internet resource allocation and pricing | |
Withanawasam et al. | Characterising trader manipulation in a limit-order driven market | |
Özer et al. | Multi-unit differential auction–barter model for electronic marketplaces | |
Kim | Maximizing sellers’ welfare in online auction by simulating bidders’ proxy bidding agents | |
Dong et al. | Unilateral counterparty risk valuation of CDS using a regime-switching intensity model | |
Uhryn et al. | Modelling a System for Intelligent Forecasting of Trading on Stock Exchanges | |
Zhang et al. | A deep reinforcement learning-based bidding strategy for participants in a peer-to-peer energy trading scenario | |
Laskey et al. | Combinatorial prediction markets for fusing information from distributed experts and models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230407 Address after: Room 601, Building B1, 136 Kaiyuan Avenue, Huangpu District, Guangzhou City, Guangdong Province, 510000 Patentee after: Guangzhou Huihui Intelligent Technology Co.,Ltd. Address before: 510275 No. 135 West Xingang Road, Guangdong, Guangzhou Patentee before: SUN YAT-SEN University |