CN113435938B - Distributed characteristic data selection method in electric power spot market - Google Patents

Distributed characteristic data selection method in electric power spot market Download PDF

Info

Publication number
CN113435938B
CN113435938B CN202110763209.3A CN202110763209A CN113435938B CN 113435938 B CN113435938 B CN 113435938B CN 202110763209 A CN202110763209 A CN 202110763209A CN 113435938 B CN113435938 B CN 113435938B
Authority
CN
China
Prior art keywords
data
representing
data set
learning model
buyer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110763209.3A
Other languages
Chinese (zh)
Other versions
CN113435938A (en
Inventor
李俊
胡本然
关心
胡妤飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mudanjiang university
State Grid Heilongjiang Electric Power Co Ltd
Heilongjiang University
Original Assignee
Mudanjiang university
State Grid Heilongjiang Electric Power Co Ltd
Heilongjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mudanjiang university, State Grid Heilongjiang Electric Power Co Ltd, Heilongjiang University filed Critical Mudanjiang university
Priority to CN202110763209.3A priority Critical patent/CN113435938B/en
Publication of CN113435938A publication Critical patent/CN113435938A/en
Application granted granted Critical
Publication of CN113435938B publication Critical patent/CN113435938B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The distributed characteristic data selection method in the power spot market solves the problem that the cost and the accuracy of building a learning model cannot be simultaneously considered when the data are selected in the prior art, and belongs to the field of power data analysis. Comprising the following steps: the data buyer end determines a user side learning model, a sample data set, and the type and the data quantity of the missing power data, forms a query and sends the query to the data seller end, and the data seller end returns a corresponding given data set to the data buyer end; the data buyer end jointly optimizes the accuracy of a user side learning model, user payment, edge server task processing delay and block chain uploading delay, and aims at maximizing the accuracy of the user side learning model, minimizing payment and delay, so as to establish an objective function; and solving the objective function, selecting characteristic data conforming to the objective function in a given data set, uploading the selected characteristic data to a blockchain by a data seller terminal, paying and acquiring the data by the data buyer terminal through the blockchain, and adding the data into a sample data set.

Description

Distributed characteristic data selection method in electric power spot market
Technical Field
The invention relates to a distributed characteristic data selection method in an electric power spot market, and belongs to the field of electric power data analysis.
Background
With the rapid development of information technology, various edge devices in the energy internet have generated a large amount of power data describing meaningful information. Complex techniques can be used to obtain business needs and financial insights from such power data, making power data resources an essential production element and strategic resource for human society. Knowledge discovery, which aims to discover basic knowledge from a large amount of power data, is a hot topic in both academia and industry due to important financial and social values hidden in various power data.
However, for large amounts of power data in the energy internet, financial knowledge discovery in the power spot market is very difficult. Data driven decisions are popular in financial knowledge discovery, which is changing scientific, business and social activities. Such an approach can provide knowledge about small data quickly and accurately, assuming that the characteristic data is redundant in a given dataset, and therefore, they will not provide convincing valuable knowledge if there is insufficient characteristic data. However, the lack of feature data is a common phenomenon in financial knowledge discovery, as financial activities often contain complex features, often owned by different organizations. Although some feature data can be generated from the history data to complete the learning task, there are still feature data that are difficult to accurately generate, which makes acquiring features a major obstacle to learning financial knowledge. Thus, it is necessary to purchase feature data from different feature data vendors. However, since budgets are typically limited, this is critical to design strategies to trade off the cost of feature data against the accuracy of the user-side learning model.
Determining the importance of features is also challenging for purchasers, which also determines the strategy to get the best performance of the learning model in the case of budget limitation. Power signature data selection is an effective technique to address this problem. The purpose of the power feature data selection is to find the most appropriate subset of power features from the original feature set so that the built learning model is better and faster. In addition, the characteristic data selection can accelerate the data processing speed, and the calculation cost is saved.
Power signature data selection has been widely studied in data mining and machine learning. Although extensive research has been conducted, most existing feature data selection studies do not take into account data costs, and collecting all of the information of the training data can be very expensive. Furthermore, due to the ever increasing amount of data and the number of data dimensions, there is a problem with a single machine that is computationally intensive, both in terms of storage and computation.
Disclosure of Invention
Aiming at the problem that the feature data subset found from the original power feature data set of the existing power spot market cannot simultaneously achieve cost and accuracy of building a learning model, the invention provides a distributed feature data selection method in the power spot market.
The invention relates to a distributed characteristic data selection method in an electric power spot market, which is realized based on a block chain system, wherein the system comprises a data seller terminal, a data buyer terminal, an edge computing server and a block chain;
the method comprises the following steps:
s1, a data buyer determines a user side learning model, a sample data set and the type and the data quantity of power data lacking in the sample data set, forms a query according to the type and the data quantity of the power data lacking, and sends the query to the data seller, and the data seller returns a corresponding given data set to the data buyer;
s2, the data buyer end jointly optimizes the accuracy of a user side learning model, user payment, task processing delay and uploading delay of a block chain, maximizes the accuracy of the user side learning model, aims at minimizing payment and delay, and establishes an objective function:
Figure BDA0003149768890000021
s.t.φ d (x)≤budget
Figure BDA0003149768890000022
Figure BDA0003149768890000023
0≤|x|≤Size
wherein phi is d (x)=φ upd (x)·(1+kAcc(x))+userpr;beha=Num Buy /Num Query The method comprises the steps of carrying out a first treatment on the surface of the When beha is less than or equal to 0.1, userpr is more than or equal to 0; when beha>0.1,userpr<0; x and λ represent inputs, x= (x 1 ,x 2 ,...,x n ),x i ∈{0,1},i=1,2,...,n, x i Feature data representing an ith type of a given dataset, n representing a number of types of features in the given dataset;
Figure BDA0003149768890000028
λ a representing the ratio of the ith task to the total task of the data buyer side offloaded to the edge server, 0.ltoreq.lambda a ≤1; d a Representing the size of the a-th task input data; lambda (lambda) a d a Representing a computational task required on an edge computing node EN of an edge server; acc (x) represents the accuracy of the learning model at the user side, 0<Acc(x)<1, a step of; alpha represents the precision parameter of the learning model at the user side; beta represents a payment parameter; ζ represents a parameter of data buyer side task processing delay; η represents a parameter of a blockchain delay; phi (phi) d (x) Representing a price; phi (phi) upd (x) Representing static pricing irrespective of the user-side learning model; userpr represents a user behavioral rewards/penalty variable; pena represents the lower limit of the user behavior penalty variable, rewa represents the upper limit of the user behavior penalty variable, pena is less than or equal to userpr is less than or equal to rewa; k represents a price adjustment parameter; the budgets represent budgets; />
Figure BDA0003149768890000024
Representing the processing delay of the a-th task; />
Figure BDA0003149768890000025
A maximum limit value representing a task processing delay; />
Figure BDA0003149768890000026
Representing local tasks (1-lambda a )d a Is a local calculation time of (1); />
Figure BDA0003149768890000027
Representing the task lambda to be calculated a d a From user data end U a Transmission time of edge computing node EN offloaded to edge server, +.>
Figure BDA0003149768890000031
Representing a computing task lambda to be accessed wirelessly a d a The computation time offloaded onto the edge computation node EN of the edge server; a=1, 2, …, N E ,N E Representing the number of tasks; / >
Figure BDA0003149768890000032
Represents the time of the transaction between the data buyer and the data seller, b=1, 2, …, N B ,N B Representing the number of blocks in the blockchain; />
Figure BDA0003149768890000033
A maximum limit value representing a transaction achievement time; />
Figure BDA0003149768890000034
Representing a b-th block packing time of the block chain; />
Figure BDA0003149768890000035
Representing the consensus time of the b-th block; />
Figure BDA0003149768890000036
Indicating the commit time of the b-th block; num (Num) Query Indicates the number of queries, num Buy Representing the number of purchases;
Figure BDA0003149768890000037
a maximum limit value representing the number of queries; />
Figure BDA0003149768890000038
A maximum limit value indicating the number of purchases; size represents the maximum limit of the amount of data that a user queries and purchases in real time;
s3, solving the objective function, selecting characteristic data which accords with the objective function in a given data set, uploading the selected characteristic data to a block chain by a data seller terminal, paying and acquiring the data through the block chain by a data buyer terminal, and adding the data into a sample data set.
Preferably, the method is characterized in that,
Figure BDA0003149768890000039
Figure BDA00031497688900000310
d represents a given data set, including m tuples, each tuple having n types of features, the query issued by the data buyer is Q, t j Tuple representing the result of query Q at D, j=1, 2, … m; in D there is a table T 1 ,T 2 ,...,T tn
Figure BDA00031497688900000311
Is T 1 ,T 2 ,...,T tn Tn represents the number of sub-tables in D,/-sub-table>
Figure BDA00031497688900000312
Representing query Q versus D table T i A set of lineage tuples;
Figure BDA00031497688900000313
representing query Q versus D table T j Is a set of uncertain lineage tuples; />
Figure BDA00031497688900000314
Figure BDA00031497688900000315
Representing tuple t j Data quality of (2); sen (0)<sen<1) Indicating the sensitivity of the user to the quality; delta represents a price coefficient for controlling a user price range; />
Figure BDA00031497688900000316
Figure BDA00031497688900000317
T now A set of lineage tuples representing current non-purchased data; p is p total Representing the overall price for a given dataset; ζ represents a coefficient of information entropy; />
Figure BDA00031497688900000319
Coefficients representing the integrity rate; integrity of j-th tuple->
Figure BDA00031497688900000318
index ij =1 indicates that the element of the ith row and j column of the n row and m column feature data of a given dataset is present, and if not present, index ij =0; h is the information entropy of a given dataset; h (t) j ) Information entropy of the j-th tuple; w= (w) 1 ,w 2 ,...,w n ),w j A weight vector representing the j-th tuple of feature data; w (w) min Representing the minimum value, w, of the weight vector max Representing the maximum value of the weight vector.
Preferably, when α is not equal to 1, β is not equal to 0, ζ is not equal to 0, η is not equal to 0, and when the objective function is solved, the S3 calculates SU values of each feature data in the given dataset by using the SUFS algorithm, uses the SU values as weight vectors of the tuple feature data, and adopts a solver programming mode to solve the objective function.
Preferably, α=1, β=0, ζ=0, η=0;
searching a set of main features in n features in a given dataset by adopting SUFS algorithm Sign S best Calculate each feature data S best Is based on the threshold delta at the set of principal features S best Is obtained by selecting characteristic data
Figure BDA0003149768890000041
And according to->
Figure BDA0003149768890000042
SU values of the characteristic data in (a) are ordered in descending order, redundant characteristic data are deleted, and the deleted +.>
Figure BDA0003149768890000043
The characteristic data which accords with the objective function and is selected by the data buyer side in a given data set;
Figure BDA0003149768890000044
SU (X, Y) represents SU value;
Figure BDA0003149768890000045
IG(X|Y)=H(X)-H(X|Y)
h (x|y), IG (x|y) represents information gain, X represents random event X, i.e., a type of feature is selected; y represents a random event Y, i.e. selecting another type of feature data; h (X) represents the information entropy of event X; h (Y) represents the information entropy of event Y; p (y) q ) Representing random event Y as Y q Probability of (2); p (x) p |y q ) Representing random event Y as Y q Under the condition that the random event X is X p Conditional probability of (2); x is x p Represents one of X, y q And represents a certain class of X.
As a preferred alternative to this,
Figure BDA0003149768890000046
preferably, in the step S1, the method for determining the sample data set at the data buyer side includes:
data buyer side owns local data set D own
Data buyer end-to-local data set D own Repairing to obtain data set
Figure BDA0003149768890000047
Data buyer end-to-local data set D own Predicting to obtain data set
Figure BDA0003149768890000051
In dataset D own
Figure BDA0003149768890000052
And->
Figure BDA0003149768890000053
And training the user side learning model, determining the type and the data quantity of the power data which are lack in the sample data set when the accuracy of the user side learning model is lower than the required accuracy, forming a query according to the type and the data quantity of the power data which are lack, and sending the query to the data seller terminal, and returning the corresponding given data set to the data buyer terminal by the data seller terminal.
Preferably, the data buyer end pairs the local data set D own Repairing to obtain data set
Figure BDA0003149768890000054
Or the data buyer end to the local data set D own Prediction is carried out to obtain a data set +.>
Figure BDA0003149768890000055
The method of (1) comprises:
segmentation dataset D own Obtaining a training data set D train And test dataset D test Establishing a deep learning model,
with training dataset D train For deep learning modelTraining the model, outputting the parameters of the trained deep learning model and the loss value of each iteration, and using the test data set D test Predicting to obtain an error value of the deep learning model prediction, adjusting parameters of the deep learning model, and repairing or predicting by using the deep learning model;
the deep learning model comprises two bidirectional LSTM layers, a multi-head attention layer, a maximum pooling layer, an average pooling layer and two fully connected layers, and a training data set D train The input data of the two-way LSTM layers are input to the two-way LSTM layers at the same time, the output of the two-way LSTM layers is input to the multi-head attention layer, the output of the multi-head attention layer is input to the maximum pooling layer and the average pooling layer at the same time, the output of the maximum pooling layer and the average pooling layer is input to one full-connection layer at the same time, the output of the full-connection layer is input to the other full-connection layer, and the output of the full-connection layer is output through the output layer.
The invention provides a block chain-based framework for buying the function with limited budget through the calculation of the edge server in the energy internet, and the buying and selling parties of data can upload transaction data and transaction information to the alliance block chain, so that safe data transaction and data sharing are realized. The objective function established by the invention jointly optimizes the accuracy of the learning model at the user side, the payment of the user, the processing delay of the task and the uploading delay of the block chain, selects data in a given data set of a data seller terminal, and realizes the maximization of the accuracy of the learning model at the user side and the minimization of the payment and the delay.
Drawings
FIG. 1 is a schematic diagram of a framework constructed in accordance with the present invention;
FIG. 2 is a diagram illustrating the communication between a data seller and a data buyer;
FIG. 3 is a schematic diagram of an edge server;
FIG. 4 is a schematic diagram of a blockchain;
FIG. 5 is a schematic diagram of an atten-LSTM predictive model;
FIG. 6 is a comparison of the predictive effect of an atten-LSTM predictive model with real data;
FIG. 7 is a graph showing loss values for an atten-LSTM prediction model;
FIG. 8 is a graph showing the effect of LSTM and GRU and the atten-LSTM predictive model of the present invention in comparison to real data.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The invention is further described below with reference to the drawings and specific examples, which are not intended to be limiting.
The distributed feature data selection method in the electric power spot market of the embodiment is implemented by using a blockchain-based system, wherein the system comprises a data seller terminal, a data buyer terminal, an edge server and a blockchain;
1) A data seller terminal. They own the data and determine the data pricing mechanism. When a request for a characteristic query is received at the data buyer, they will provide a static price for the characteristic. The data seller terminal can dynamically adjust the price of the feature according to the model accuracy given by the data buyer terminal by taking the feature set as a unit. They offer the proper price of the owned feature to maximize their revenue.
2) And the data buyer end. For some learning tasks, the data vendor side may use a transducer model to generate new data from existing historical data to implement some learning tasks. However, for characteristics that are difficult to generate, a query request for such characteristics must be initiated to the data vendor side. The data vendor side may choose the function of the purchase based on a limited budget and accuracy of the model.
3) And an edge server. There are some servers in the edge computing network. They have sufficient computing power to run some time consuming algorithms, such as reinforcement learning algorithms or deep learning algorithms. The data seller side or the data buyer side may offload some of the computing tasks to some edge servers.
4) A blockchain. To secure the data, the selected feature data is uploaded to the blockchain for sharing between the seller and the buyer. Both the data seller side and the data buyer side are members of the blockchain license.
The distributed characteristic data selection method in the electric power spot market of the embodiment comprises the following steps:
determining a user side learning model, a sample data set and the type and the data quantity of power data lacking in the sample data set by a data buyer side, forming a query according to the type and the data quantity of the power data lacking, and sending the query to the data seller side, and returning a corresponding given data set to the data buyer side by the data seller side;
step two, the data buyer end jointly optimizes the accuracy of a user side learning model, user payment, task processing delay and uploading delay of a block chain, and aims at maximizing the accuracy of the user side learning model, minimizing the payment and the delay, and establishes a target function:
Figure BDA0003149768890000071
s.t.φ d (x)≤budget
Figure BDA0003149768890000072
Figure BDA0003149768890000073
0≤|x|≤Size
wherein phi is d (x)=φ upd (x)·(1+kAcc(x))+userpr;beha=Num Buy /Num Query The method comprises the steps of carrying out a first treatment on the surface of the When beha is less than or equal to 0.1, userpr is more than or equal to 0; when (when)beha>0.1,userpr<0; x and λ represent inputs, x= (x 1 ,x 2 ,...,x n ),x i ∈{0,1},i=1,2,...,n,x i Feature data representing an ith type of a given dataset, n representing a number of types of features in the given dataset; lambda= (lambda) 12 ,...,λ NE ),λ a Representing the ratio of the ith task to the total task of the data buyer side offloaded to the edge server, 0.ltoreq.lambda a ≤1;d a Representing the size of the a-th task input data; lambda (lambda) a d a Representing a computing task required on an edge computing node EN of an edge server; acc (x) represents the accuracy of the learning model at the user side, 0 <Acc(x)<1, a step of; alpha represents the precision parameter of the learning model at the user side; beta represents a payment parameter; ζ represents a parameter of data buyer side task processing delay; η represents a parameter of a blockchain delay; phi (phi) d (x) Representing a price; phi (phi) upd (x) Representing static pricing irrespective of the user-side learning model; userpr represents a user behavioral rewards/penalty variable; pena represents the lower limit of the user behavior penalty variable, rewa represents the upper limit of the user behavior penalty variable, pena is less than or equal to userpr is less than or equal to rewa; k represents a price adjustment parameter; the budgets represent budgets;
Figure BDA0003149768890000074
representing the processing delay of the a-th task; />
Figure BDA0003149768890000075
A maximum limit value representing a task processing delay; />
Figure BDA0003149768890000076
Representing local tasks (1-lambda a )d a Is a local calculation time of (1); />
Figure BDA0003149768890000077
Representing the task lambda to be calculated a d a From user data end U a Transmission time of edge computing node EN offloaded to edge server, +.>
Figure BDA0003149768890000078
Representing a computing task lambda to be accessed by wireless a d a The computation time offloaded onto the edge computation node EN of the edge server; a=1, 2, …, N E ,N E Representing the number of tasks; />
Figure BDA0003149768890000079
Represents the time of the transaction between the data buyer and the data seller, b=1, 2, …, N B ,N B Representing the number of blocks in the blockchain; />
Figure BDA00031497688900000710
A maximum limit value representing a transaction achievement time; / >
Figure BDA00031497688900000711
Representing a b-th block packing time of the block chain; />
Figure BDA00031497688900000712
Representing the consensus time of the b-th block; />
Figure BDA00031497688900000713
Indicating the commit time of the b-th block; num (Num) Query Indicates the number of queries, num Buy Representing the number of purchases;
Figure BDA0003149768890000081
Figure BDA0003149768890000082
Figure BDA0003149768890000083
a maximum limit value representing the number of queries; />
Figure BDA0003149768890000084
A maximum limit value representing the number of purchases; size represents the maximum limit of the amount of data that a user queries and purchases in real time;
step three, solving the objective function, selecting characteristic data which accords with the objective function in a given data set, uploading the selected characteristic data to a block chain by a data seller terminal, paying and acquiring the data by the data buyer terminal through the block chain, and adding the data into a sample data set.
The goal of the problem in this embodiment is to maximize the benefit of the buyer, i.e. to maximize accuracy and minimize payment. The decision variables are { x, lambda }, alpha, beta, zeta and eta, and can be set and adjusted through experiments.
Step two of the present embodiment includes constraints on data pricing:
for maximum benefit to the data seller side, static pricing based on horizontal and vertical pricing and dynamic pricing based on data content and query counts are included. The static part takes into account factors of data incompleteness, query incompleteness, repeated charges for historical queries, and data updates. Wherein the horizontal pricing takes the tuple as a minimum pricing unit; vertical pricing takes features as minimum pricing units. The dynamic part considers factors such as user behavior, user side model accuracy and the like.
The data analyst data buyer side of a power plant is performing a knowledge discovery task. The data buyer wants to study the power trading trends in different regions. However, the data buyer has only partially incomplete power transaction history data. Data sets with such information are partly present in their own power plants and partly in other power plants, which are expensive to exceed the budget. The data is purchased as little as possible, the data buyer side can know the trend of the historical data, and the local historical data is used for generating new data for the tasks of the data buyer side. Because data is difficult to learn, the data buyer side can issue several queries on the data set owned by the data seller and purchase at the appropriate price. In this way, the data buyer side does not pay a high price for all data sets used in the task.
As shown in graph 1, the sample dataset has three relational power plants (G), prices (P), consumers (C). In particular, the generated data and price data in the sample data set are incomplete due to loss of privacy protection or equipment failure.
TABLE 1
Figure BDA0003149768890000091
1) Static pricing:
the static pricing component of the present embodiment is determined based on the entropy of the information and the integrity of the data. For horizontal pricing, an incomplete sample data set D is given m×n Has m tuples, each tuple has n characteristics, and the total price is P total . Due to different task preferences, the tuple feature weight vector is w= (W 1 ,w 2 ,...,w m ) Wherein w is j (1. Ltoreq.j.ltoreq.m) represents the weight of the j-th tuple. Thus, the j-th tuple t j The integrity of (c) is defined as follows:
Figure BDA0003149768890000092
wherein index ij =1 indicates that the element of the ith row, j and column is present, otherwise index ij =0。
Entropy is a measure of uncertainty in random variables, and information entropy is often used as a quantization index for information content. Let A be a discrete random variable and entropy be defined as
Figure BDA0003149768890000093
Thus the j-th tuple t j The price of (2) is as follows:
Figure BDA0003149768890000094
/>
Figure BDA0003149768890000095
wherein h (t) j ) And h is the information entropy of the j-th tuple, and h is the information entropy of the whole data set. ζ is,
Figure BDA0003149768890000096
Coefficients of information entropy and completeness rate, respectively, which satisfy the constraint +.>
Figure BDA0003149768890000108
The basic idea of static price is to scale the overall price according to the degree of integrity of the tuples and the amount of information.
Initially, the data buyer side initiates a query Q to the data set d= (G, C, P) 1 The power load and total annual load data for city a, 2020.12 months, were investigated as shown in table 2. The data buyer side then initiates a query Q 2 And (5) investigating the electricity price of the market A. When data buyer end sends Q 3 At the time of inquiry, p 3 The GID absence of the stripe data, as shown in table 1, results in inaccurate queries. For a data seller to profit from the data buyer side, important data, such as characteristics, can be sold, a key issue being how to price the data quickly and reasonably.
Table 2 SQL statement
Figure BDA0003149768890000101
TABLE 3 Table 3
Figure BDA0003149768890000102
First, the present embodiment uses the concept of a data lineage to determine which data is used. Given tuple t, the exact underlying data that yields t is called its lineage. In other words, each tuple t appears in the output of the query, a group of tuples in the input, called the lineage of t. Intuitively, the lineage definition of t refers to collecting all of the input data that "contributes" t or helps "produce" t, as described in definition 1.
Definition 1 (tuple lineage). Given data set D, there is a table T 1 ,T 2 ,...,T tn And query Q. Let Q (D) =q (T 1 ,T 2 ,...,T tn ) In Table T for query Q 1 ,T 2 ,...,T tn Results set above. For a tuple t.epsilon.Q (D), a lineage set of T refers to Q in Table T 1 ,T 2 ,...,T tn The above expression L (t ε Q (D), D) (abbreviated as L (t, D)) is defined by formula (5):
Figure BDA0003149768890000103
Figure BDA0003149768890000104
a vector form of a set of spectral coefficients of formula (6) t, each element
Figure BDA0003149768890000105
From T j Is a tuple of (a). For j=1..,
Figure BDA0003149768890000106
is T is at T j And T is of the lineage of j Is helpful in generating a result tuple t. Formally, a->
Figure BDA0003149768890000107
Is T 1 ,T 2 ,...,T tn Is a subset of (1)
Figure BDA0003149768890000111
Figure BDA0003149768890000112
/>
Then, one of the result sets Q (D) is queriedA set of spectral coefficients, denoted M (Q, D), is a joint set of spectral sets L (t, D), from each result tuple t e Q (D), i.e., M (Q, D) = u t∈ Q (D) L (t, D). It shows that the data usage of query Q is evaluated by M (Q, D).
The query result is ambiguous when the data is incomplete, wherein the root cause of the ambiguous query result is the absence of a key. For tuple t j By parameter miss j To indicate the degree of bond missing
Figure BDA0003149768890000113
Where j is a missing bond. Thus, tuple t j Mass τ (t) j ) Representation of
Figure BDA0003149768890000117
Wherein sen (0)<sen<1) Representing the user's sensitivity to quality, the parameter sen may be dynamically adjusted based on historical purchases of the data consumer. To some extent, the degree of variation in tuple quality can be measured. The smaller the sen value, the faster the mass value changes (i.e., mass versus mass j The more sensitive).
Given an incomplete data set D and a query Q. The quality σ (Q, D) of the query Q is expressed as
Figure BDA0003149768890000114
Thus, the price function for a query (Q, D) on an incomplete dataset is defined as follows
Figure BDA0003149768890000115
Where n is the number of result tuples, i.e., n= |q (D) |, Δ is the price coefficient used to control the user price range, M (Q, D) is the set of lineage tuples of the query result Q (D).
After the data buyer purchases a query, the data buyer issues a new query. At this point, the results of the new query and some of the data already purchased are repeated. Thus, the pricing mechanism should prevent the buyer from paying excessive fees in view of the historical queries. When the information for t is not updated, the buyer can use the repeated lineage tuples for free and therefore pricing is as follows
Figure BDA0003149768890000116
T now =M(Q,D)-T buy
Wherein T is buy Representing the set of lineage tuples that have been purchased, this function avoids repeated charges for historical queries.
When the data is updated, T buy The middle part of the data changes, so an additional attribute ver is added to t, representing the version number of the tuple. The version number is initialized to 0 and incremented each time a tuple is updated. The system only retains the latest version of each tuple, priced as follows
Figure BDA0003149768890000121
T n o w =M(Q,D)-T buy +T upd (14)
Wherein T is upd Representing the set of tuples that have been purchased for which ver has changed.
For vertical pricing, similar to equation 3, the price for the ith feature is defined as follows:
Figure BDA0003149768890000122
/>
Figure BDA0003149768890000123
Figure BDA0003149768890000124
μ+η=1
wherein mnum i Representing the number of missing values of the ith feature
2) Dynamic pricing: user behavior indicators are considered in the pricing method. The data buyer side initiates a query to the data seller side, and then the data seller side returns the query price phi (Q, D), and the data buyer side selects to pay money, purchase data, or abandon the purchase after obtaining the offer. Let the number of queries at the data buyer end be Num Query Number of purchases of Num Buy , beha=Num Buy /Num Query Is a user behavior index (beha is more than or equal to 0 and less than or equal to 1). When beha is less than or equal to 0.1, the method indicates that the query times of the data buyer end are more, but the purchase times are less, and if the data set is larger, the query delay is high, the calculation is complex, and the system resources are wasted. Thus, the present embodiment adds a user behavior rewards/penalties mechanism userpr; when beha is less than or equal to 0.1, userpr is less than or equal to 0, indicating that the user must pay a higher fee. Otherwise, the user pays a lower fee. Pricing function of
φ user (Q,D)=φ upd (Q,D)+userpr (19)
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0003149768890000125
in addition, the present embodiment also considers the impact of user-side model accuracy on data pricing. The data buyer wants to investigate the power trade situation in 2020 in different areas. The data buyer wants to test the predictive capability of the generated energy of some new energy sources on the electric power exchange. Initially, without a model, the data buyer needs to send a query Q to the data seller 1 Buying data and training a model; after training, the accuracy of the model is kept stable, and a query Q is sent to the data seller terminal at the moment 2 And fine tuning is carried out on the model.The price of the two queries is also different due to the different impact on the model accuracy. However, Q is desirable in this embodiment 1 The query price is greater than Q 2 Therefore, this embodiment has
φ d (Q,D)=φ upd (Q,D)·(1+kAcc(Q,D))+userpr (21)
Let x= (Q, D), i.e
φ d (x)=φ upd (x)·(1+kAcc(x))+userpr (22)
Wherein Acc (x) is model accuracy, 0< Acc (x) <1. The formula applies equally to vertical pricing.
For example, when the accuracy is less than 0.7, the price is higher than the static price, when the accuracy is greater than 0.85, the price is lower than the static price, and when the accuracy is between 0.7 and 0.85, the price is slightly higher than the static price. The price is adjusted in units of feature sets, so can be set
Figure BDA0003149768890000131
At the same time, after the data buyer purchases the data once, the weight w of the feature changes, and thus the price of the data also changes.
Constraint on edge server computation delay in this embodiment:
each data terminal (including a data buyer side and a data seller side) has a calculation task, wherein the calculation task of the data seller side is to price data, and the calculation task of the buyer is knowledge discovery. The computing Task may be described as Task a (d a ,s a ),a∈N E Wherein d is a Representing the size of the input data, s a Is a computing Task a The required computing resources. Each task may be divided into two parts, calculated locally and calculated on the edge calculation node EN. Specifically lambda a (0≤λ a And less than or equal to 1) is an unloading ratio variable which represents the ratio of the unloading task to the total task. User terminal U a Lambda is set to a d a Data offloaded to EN, locallyCalculate the rest (1-lambda) a )d a Is a data of (a) a data of (b).
The task processing delay may be divided into two parts accordingly. The first part is that the task passes through the wireless channel from the node U a Transmission time to EN. The second part is the computation time, which depends on the allocated computing resources and task size.
1) Transmission time: lambda is set to a d a From U a The transfer time offloaded to EN can be written as
Figure BDA0003149768890000132
Wherein R is a Is the throughput. Throughput represents the amount of data that enters and passes through a system in a time slot, which may also be expressed in terms of data rate.
2) Calculating time: task for each Task a ,U a Can locally execute Task on own computing resource a (1-lambda) a )d a Part of the process and offload the rest to the edge computing node EN for processing. Let f a Representing U a Which varies for different users and can be obtained by offline measurements. Local calculation time
Figure BDA0003149768890000141
Can be expressed as
Figure BDA0003149768890000142
For edge computing node computing method, U a Lambda is set by wireless access a d a Is offloaded to EN. Calculation time on EN
Figure BDA0003149768890000143
Will be
Figure BDA0003149768890000144
3) Task processing delay: since each task can be split into two parts, executed locally in parallel and on EN in parallel, the task processing delay is determined by the maximum of the two parallel execution parts. If the task processing delay is determined by the local execution section, the task processing delay is equal to the local calculation time because the local execution section does not undergo the process of wireless transmission. If the task processing delay is determined by the EN execution portion, the task processing delay includes two portions: 1) Transmission time; 2) Calculation time on EN. Thus, task processing is delayed by
Figure BDA0003149768890000145
Constraint of blockchain delay in this embodiment:
blockchain transaction time: the block packing time of the block chain is T p Related to the data content of the transaction; the consensus time of the block is T C The commit time of the block is T s Related to the size of the block and the consensus mechanism. The time for the transaction to be completed is
T BC =T p +T C +T s (28)
In this embodiment, the feature selection problem is formulated:
the present embodiment represents a solution to the feature selection problem with a binary coded vector x, which is described as follows:
x=(x 1 ,x 2 ,...,x n ),x i ∈{0,1},i=1,2,...,n (29)
wherein x is i =1 indicates that the ith feature is selected, and x i =0 means that the feature is not selected.
(30-39)
In the embodiment, when α is not equal to 1, β is not equal to 0, ζ is not equal to 0, η is not equal to 0, and S3 calculates SU values of each feature data in a given data set by using a SUFS algorithm when the objective function is solved, and uses SU values as weight vectors of the tuple feature data, and solves the objective function by using a solver programming method.
In this embodiment, the feature selection is performed using the symmetry uncertainty as a measurement index. Comprising 2 aspects: (1) how to determine whether a feature is associated with a tag; (2) How to determine whether such relevant features are redundant when considering other relevant features;
when α=1, β=0, ζ=0, η=0; searching a group of main features S in n features in a given data set by adopting SUFS algorithm best Calculate each feature data S best Is based on the threshold delta at the set of principal features S best Is obtained by selecting characteristic data
Figure BDA0003149768890000151
And according to->
Figure BDA0003149768890000152
SU values of the characteristic data in (a) are ordered in descending order, redundant characteristic data are deleted, and the deleted +.>
Figure BDA0003149768890000153
The characteristic data which is selected by the data buyer side in the given data set and accords with the objective function is the characteristic data;
Figure BDA0003149768890000154
SU (X, Y) represents SU value;
Figure BDA0003149768890000155
IG(X|Y)=H(X)-H(X|Y)
h (x|y), IG (x|y) represents information gain, X represents random event X, i.e., a type of feature is selected; y represents a random event Y, i.e. selecting another type of feature data; h (X) represents event XIs an information entropy of (a); h (Y) represents the information entropy of event Y; p (y) q ) Representing random event Y as Y q Probability of (2); p (x) p |y q ) Representing random event Y as Y q Under the condition that the random event X is X p Conditional probability of (2); x is x p Represents one of X, y q And represents a certain class in X.
The information gain is symmetrical for two random variables X and Y. Symmetry is a desirable attribute for measuring correlation between features. However, the information gain is biased to facilitate a function with more values. Furthermore, the values must be normalized to ensure that they are comparable, with the same effect.
Step one of the present embodiment further includes recovery of the history data, generation of the history data, and selection of the features. The present embodiment is based on data restoration and generation of an attention mechanism, and uses data prediction of an attention mechanism, and in the present embodiment, a data buyer side has a local data set with missing data, which is time-series data, as shown in table 3. To better discover important information from existing data, the data buyer can repair the missing values and generate new data from the existing data. Time series data prediction is an effective data repair and generation method.
Self-attention, also known as internal attention, is a mechanism of attention associated with different positions of a single sequence, with the aim of calculating a representation of the sequence. Attention mechanisms have become an integral part of task sequence modeling for reading understanding, text implications, sequence prediction, etc. It allows modeling of dependent items in an input or output sequence irrespective of the distance between them. This embodiment combines a self-attention mechanism with an algorithm that loops the neural network,
in the first step, the method for determining the sample data set by the data buyer side comprises the following steps:
data buyer side owns local data set D own
Data buyer end-to-local data set D own Repairing to obtain data set
Figure BDA0003149768890000161
Data buyer end-to-local data set D own Predicting to obtain data set
Figure BDA0003149768890000162
In dataset D own
Figure BDA0003149768890000163
And->
Figure BDA0003149768890000164
When the accuracy of the user side learning model is lower than the required accuracy, determining the type and the data quantity of the power data lacking in the sample data set, forming a query according to the type and the data quantity of the power data lacking, and sending the query to a data seller terminal, and returning the corresponding given data set to the data buyer terminal by the data seller terminal.
Data buyer end-to-local data set D own Repairing to obtain data set
Figure BDA0003149768890000165
Or the data buyer end to the local data set D own Prediction is carried out to obtain a data set +.>
Figure BDA0003149768890000166
The method of (1) comprises:
segmentation dataset D own Obtaining a training data set D train And test dataset D test Establishing a deep learning model,
with training dataset D train Training the deep learning model, outputting the parameters of the trained deep learning model and the loss value of each iteration, and using the test data set D test Predicting to obtain an error value of the deep learning model prediction, adjusting parameters of the deep learning model, and repairing or predicting by using the deep learning model;
the deep learning model comprises two bidirectional LSTM layers, a multi-head attention layer, a maximum pooling layer, an average pooling layer and two fully connected layers, and a training data set D train The input data of the two-way LSTM layers are input to the two-way LSTM layers at the same time, the output of the two-way LSTM layers is input to the multi-head attention layer, the output of the multi-head attention layer is input to the maximum pooling layer and the average pooling layer at the same time, the output of the maximum pooling layer and the average pooling layer is input to one full-connection layer at the same time, the output of the full-connection layer is input to the other full-connection layer, and the output of the full-connection layer is output through the output layer.
The concept of the attention mechanism, as its name implies, is to pay attention to different features to predict the outcome. The self-care function is described as mapping a query and a set of key-value pairs to an output, where the query, key, value, and output are vectors. The output is calculated as a weighted sum of values, where the weight assigned to each value is calculated by querying the compatibility function with the corresponding key. For example, when a word is encoded, the representations (value vectors) of all words are weighted and summed, and the weights are obtained by the dot product of the word representation (key vector) and the encoded word representation (query vector) and by softmax.
For Scaled Dot-Product Attention, the input consists of a query with dimension d k A bond and a dimension d v Is comprised of values of (a). In general, this embodiment calculates the attention function on a query bundle packed as matrix Q, keys and values packed as matrices K and V, respectively. Thus, the output matrix is:
Figure BDA0003149768890000171
the multi-headed attention mechanism expands the ability of the model to focus on different locations. d, d model The key, value and attention function of the query of the dimension are projected linearly h times. After the attention function is executed in parallel, the results are connected and projected to obtain a final value.
Figure BDA0003149768890000172
Wherein the projection is a parameter matrix
Figure BDA0003149768890000173
And->
Figure BDA0003149768890000174
This embodiment employs h=3 parallel attention headers. For each of these, d is used k =d v =64,d m o del =300. In the atten-LSTM predictive algorithm model, the bi-directional LSTM layer is located before the multi-headed attention layer, as shown in FIG. 5.
LSTM includes 4 components: input gate, forget gate, cell status and output gate. Input gate i t Including the current input x t Last hidden layer h t-1 Last cell state c t-1 And a weight matrix W xi ,W hi ,W ci ,b i Determining how much new information to add:
i t =σ(W xi x t +W hi h t-1 +W ci c t-1 +b i ) (42)
forgetting door f t Including the current input x t Last hidden layer h t-1 Last cell state c t-1 And a weight matrix W xf , W hf ,W cf ,b f Determining how much old information was discarded:
f t =σ(W xf x t +W hf h t-1 +W cf c t-1 +b f ) (43)
the cell state is as follows:
c t =i t g t +f t c t-1 (44)
g t =tanh(W xc x t +W hc h t-1 +W cc c t-1 +b c ) (45)
the output gate includes the current input x t Last hidden layer h t-1 Current cell state c t And a weight matrix W xo ,W ho , W co ,b o Which information outputs are determined:
o t =σ(W xo x t +W ho h t-1 +W co c t +b o ) (46)
h t =o t tanh(c t ) (47)
and (3) experimental verification:
this embodiment conducted extensive experimentation to evaluate the performance of the proposed algorithm. First, the present embodiment tested the atten-LSTM predictive model using the underlying dataset in the UCI machine learning store. The atten-LSTM predictive model is then compared to existing temporal data predictive algorithms (LSTM, GRU) to demonstrate the superiority of the LSTM algorithm based on the self-attention mechanism used. Next, the present embodiment uses several benchmark data sets of the UCI machine learning store to evaluate the performance of the SUFS algorithm feature selection and give a pricing table. In the experiment, one benchmark was first selected from the dataset as the test sample, the remainder constituting the training sample. All experiments were performed on a personal computer using the Intel Core i79750H CPU 2.6ghz,16gb RAM and Windows 10 64bit of python 3.7.
Performance evaluation:
1) Performance of the atten-LSTM prediction model:
the present embodiment uses a time series data set to test the predictive performance of the atten-LSTM predictive model, with training set 7148 and training set 893. In this embodiment, the number of nodes in the bidirectional LSTM layer is 128, and the number of parallel heads in the multi-head attention mechanism is h=3, and d k =d v =64,d model =300. The prediction result is shown in fig. 6, and it can be seen from fig. 6 that the time-ordered data can be predicted well by the atten-LSTM algorithm.
The number of iterations of the atten-LSTM prediction model is 200, the mean square error (Mean Squared Error, MSE) is the loss, and the loss value for each iteration is shown in fig. 7. As can be seen from fig. 7, the algorithm reaches convergence at 50 iterations.
Further, in this embodiment, a conventional time-series data prediction algorithm, i.e., LSTM and GRU, is selected and compared with the algorithm proposed in this embodiment, where both LSTM and GRU use 128 node numbers. The comparison is shown in fig. 8, where the Root Mean Square Error (RMSE) of the atten_lstm prediction, LSTM prediction and GRU prediction are 0.0645,0.05473, 0.05582, respectively; the Mean Square Error (MSE) is 0.00396,0.00337,0.00327, respectively. It can be seen that the algorithm proposed in this embodiment has the same ability to predict time series data as the existing algorithm, and the average time per iteration is 1s 105us, which is less than 1s 125us and 1s 120us of LSTM and GRU, when training. Compared with RNN, the Attention mechanism has the advantage of parallel computation, and the training time is greatly reduced. In addition, RNNs themselves have some capture capability for long-range dependencies, but since the sequence model is made to flow through the gating unit, information is kept flowing and is selectively delivered. However, under the condition that the time sequence length is longer and longer, the capability of capturing the dependency relationship is lower and lower, and each recursion is accompanied by information loss, so that an Attention mechanism is added to enhance capturing of the part of the dependency relationship focused by the embodiment.
2) Performance of feature selection based on symmetry uncertainty:
the present embodiment uses the BreastCancer benchmark dataset of the UCI machine learning store to evaluate the performance of feature selection based on symmetry uncertainty and to give a pricing table. Using Symmetry Uncertainty (SU) as a measure, good features for classification are selected based on a correlation analysis of the features, including tags. Among them, two aspects are considered: (1) how to determine whether a feature is associated with a tag; (2) How to determine whether such related features are redundant when considering other related features. And sorting the feature data according to the feature correlation analysis based on the symmetry uncertainty, and deleting redundant features, wherein a threshold value of 0.05 is set. The main features are 1,2,4,5,6,7,8,9 features, the most main feature is 2 nd feature, the redundant feature is 3 rd feature, and the redundant feature is deleted.
After feature ordering is obtained, the present embodiment sets feature weights w=10×su+n-rank according to SU of the feature, i.e. (w) l ,w 2 ,…,w n )=10(SU 1,C ,SU 2,C ,…,SU n,C ) +n-rank as shown in Table 4.
TABLE 4 SU value, weight and number of missing values for the features
Features (e.g. a character) SU Weighting of Number of missing values
1 0.233938 3.33938 1
2 0.419128 12.19128 4
3 0.386141 9.86141 5
4 0.287368 4.87368 2
5 0.319727 8.19727 1
6 0.391315 10.91315 15
7 0.296451 5.96451 1
8 0.319566 7.19566 2
9 0.206039 2.06039 3
Totals to 2.473536 64.39677 34
3) Influence of pricing mechanism:
Static vertical pricing is performed according to equation (15), with μ=η=0.5. Note also that redundant feature 3 and feature 2 need only be purchased one. Static vertical pricing of features is shown in table 5.
Table 5 data pricing
Features (e.g. a character) Static vertical pricing Dynamic pricing
1 10.10491 9.77227
2 15.20727 14.95914
3 13.75983 13.48772
4 9.20084 8.85323
5 12.01888 11.71795
6 13.24595 12.96534
7 11.47236 11.16237
8 10.63601 10.31217
9 4.353939 3.926051
Totals to 100 97.15626
According to the greedy algorithm, the data buyer first purchases the feature with the greatest influence on the tag, namely the 2 nd feature. After the data buyer obtains the characteristics, the characteristics are utilized to train the model, and the model accuracy Acc (x) is obtained. Setting the trained classification model as a logistic regression model, training for 200 times by using the 2 nd column of characteristic data to obtain the average accuracy rate of 0.82813, wherein the average accuracy rate is smaller than the target accuracy rate of 0.95 of the data buyer side, as shown in the table, so that the data buyer side then initiates a query purchase application to the data seller. Because the data buyer has purchased the 2 nd column of features and trained the model, the feature ordering and feature weights provided by the seller change, and the data pricing changes dynamically.
TABLE 6
Feature set Accuracy rate of
{1} 0.79625730994152
{2} 0.82812865497076
…… ……
{2,3,4,5,6,7,8,9} 0.982456140350877
{1,2,3,4,5,6,7,8,9} 0.964912280701754
According to feature selection based on symmetry uncertainty and formulas (19) - (23), num of data buyer side is set Query =5, Num Buy =1, userpr= -0.5, so the price change is shown in table 6. At the position of
Figure BDA0003149768890000201
In the above, let α=1, β=0, and widget=50, that is, the data buyer needs to be within the budget 50, so as to obtain the data with the highest accuracy of the model. According to the greedy algorithm, the feature set purchased by the data buyer is {2,6,5,8}. Experiments prove that the model and the method provided by the embodiment enable a data buyer to select more effective characteristics under limited budget, and the data seller can price the data dynamically according to the requirements of users.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that the different dependent claims and the features described herein may be combined in ways other than as described in the original claims. It is also to be understood that features described in connection with separate embodiments may be used in other described embodiments.

Claims (5)

1. The distributed characteristic data selection method in the electric power spot market is characterized in that the method is realized based on a block chain system, and the system comprises a data seller terminal, a data buyer terminal, an edge calculation server and a block chain;
The method comprises the following steps:
s1, a data buyer determines a user side learning model, a sample data set and the type and the data quantity of power data lacking in the sample data set, forms a query according to the type and the data quantity of the power data lacking, and sends the query to the data seller, and the data seller returns a corresponding given data set to the data buyer;
s2, the data buyer end jointly optimizes the accuracy of a user side learning model, user payment, task processing delay and uploading delay of a block chain, maximizes the accuracy of the user side learning model, aims at minimizing payment and delay, and establishes an objective function:
Figure FDA0004069873560000011
s.t.φ d (x)≤budget
Figure FDA0004069873560000012
Figure FDA0004069873560000013
0≤|x|≤Size
wherein phi is d (x)=φ upd (x)·(1+kAcc(x))+userpr;beha=Num Buy /Num Query The method comprises the steps of carrying out a first treatment on the surface of the When beha is less than or equal to 0.1, userpr is more than or equal to 0; when beha is more than 0.1, userpr is less than 0; x and λ represent inputs, x= (x 1 ,x 2 ,...,x n ),x i ∈{0,1},i=1,2,...,n,x i Feature data representing an ith type of a given dataset, n representing a number of types of features in the given dataset;
Figure FDA0004069873560000014
λ a representing the ratio of the a-th task to the total task of the data buyer side unloaded to the edge server, 0 is less than or equal to lambda a ≤1;d a Representing the size of the a-th task input data; lambda (lambda) a d a Representing a computational task required on an edge computing node EN of an edge server; acc (x) represents the accuracy of the learning model at the user side, and 0 < Acc (x) < 1; alpha represents the precision parameter of the learning model at the user side; beta represents a payment parameter; ζ represents a parameter of data buyer side task processing delay; η represents a parameter of a blockchain delay; phi (phi) d (x) Representing a price; phi (phi) upd (x) Representing static pricing irrespective of the user-side learning model;
userpr represents a user behavioral rewards/penalty variable; pena represents the lower limit of the user behavior penalty variable, rew represents the upper limit of the user behavior penalty variable, pena is less than or equal to userpr is less than or equal to rew; k represents a price adjustment parameter; the budgets represent budgets; t (T) a EC Representing the processing delay of the a-th task;
Figure FDA0004069873560000015
a maximum limit value representing a task processing delay; />
Figure FDA0004069873560000016
Representing local tasks (1-lambda a )d a Is a local calculation time of (1); />
Figure FDA0004069873560000017
Representing the calculationTask lambda a d a From user data end U a Transmission time of edge computing node EN offloaded to edge server, +.>
Figure FDA0004069873560000021
Representing a calculation task lambda by wireless access a d a The computation time offloaded onto the edge computation node EN of the edge server; a=1, 2,.. E ,N E Representing the number of tasks; t (T) b BC Indicating the data buyer side and the data seller side b block transaction achievement time, b=1, 2,.. B ,N B Representing the number of blocks in the blockchain; />
Figure FDA0004069873560000022
A maximum limit value representing a transaction achievement time; />
Figure FDA0004069873560000023
Representing a b-th block packing time of the block chain; />
Figure FDA0004069873560000024
Representing the consensus time of the b-th block;
Figure FDA0004069873560000025
indicating the commit time of the b-th block; num (Num) Query Indicates the number of queries, num Buy Representing the number of purchases;
Figure FDA0004069873560000026
Figure FDA0004069873560000027
A maximum limit value representing the number of queries; />
Figure FDA0004069873560000028
A maximum limit value representing the number of purchases; size represents the most amount of data that a user queries and purchases in real timeA large limit; />
S3, solving the objective function, selecting characteristic data which accords with the objective function in a given data set, uploading the selected characteristic data to a block chain by a data seller terminal, paying and acquiring the data through the block chain by a data buyer terminal, and adding the data into a sample data set.
2. The method for distributed feature data selection in a power spot market according to claim 1,
Figure FDA0004069873560000029
Figure FDA00040698735600000210
d represents a given data set, including m tuples, each tuple having n types of features, the query issued by the data buyer is Q, t j A tuple representing the result of query Q at D, j=1, 2,..m; in D there is a table T 1 ,T 2 ,...,T tn
Figure FDA00040698735600000211
Is T 1 ,T 2 ,...,T tn Tn represents the number of sub-tables in D,/-sub-table>
Figure FDA00040698735600000212
Representing query Q versus D table T i A set of lineage tuples; />
Figure FDA00040698735600000213
Representing query Q versus D table T j Is a set of uncertain lineage tuples; />
Figure FDA00040698735600000214
Figure FDA00040698735600000215
Representing tuple t j Data quality of (2); sen represents the sensitivity of the user to quality, 0<sen<1,miss j Indicating the degree of bond deletion; delta represents a price coefficient for controlling a user price range; />
Figure FDA00040698735600000216
T now A set of lineage tuples representing current non-purchased data; p is p total Representing the overall price for a given dataset; ζ represents a coefficient of information entropy; />
Figure FDA00040698735600000217
Coefficients representing the integrity rate; integrity of j-th tuple->
Figure FDA0004069873560000031
index ij =1 indicates that the element of the ith row and column of the n row and column characteristic data of a given data set is present, and if not present, index ij =0; h is the information entropy of a given dataset; h (t) j ) Information entropy of the j-th tuple; w= (w) 1 ,w 2 ,...,w n ),w j A weight vector representing the j-th tuple of feature data; w (w) min Representing the minimum value, w, of the weight vector max Representing the maximum value of the weight vector.
3. The method for distributed feature data selection in a power spot market according to claim 1,
Figure FDA0004069873560000032
4. the method for selecting distributed feature data in a power spot market according to claim 1, wherein in S1, the method for determining a sample data set at a data buyer side includes:
data buyer side owns local data set D own
Data buyer end-to-local data set D own Repairing to obtain data set
Figure FDA0004069873560000033
Data buyer end-to-local data set D own Predicting to obtain data set
Figure FDA0004069873560000034
In dataset D own
Figure FDA0004069873560000035
And->
Figure FDA0004069873560000036
And training the user side learning model, determining the type and the data quantity of the power data lacking in the sample data set when the accuracy of the user side learning model is lower than the required accuracy, forming a query according to the type and the data quantity of the power data lacking, and sending the query to the data seller terminal, and returning the corresponding given data set to the data buyer terminal by the data seller terminal.
5. The method for distributed feature data selection in a power spot market according to claim 4, wherein the data buyer end pairs the local data set D own Repairing to obtain data set
Figure FDA0004069873560000037
Or the data buyer end to the local data set D own Prediction is carried out to obtain a data set +.>
Figure FDA0004069873560000038
The method of (1) comprises:
segmentation dataset D own Obtaining a training data set D train And test dataset D test Establishing a deep learning model by using a training data set D train Training the deep learning model, outputting the parameters of the trained deep learning model and the loss value of each iteration, and using the test data set D test Predicting to obtain an error value of the deep learning model prediction, adjusting parameters of the deep learning model, and repairing or predicting by using the deep learning model;
the deep learning model comprises two bidirectional LSTM layers, a multi-head attention layer, a maximum pooling layer, an average pooling layer and two fully connected layers, and a training data set D train The input data of the two-way LSTM layers are input to the two-way LSTM layers at the same time, the output of the two-way LSTM layers is input to the multi-head attention layer, the output of the multi-head attention layer is input to the maximum pooling layer and the average pooling layer at the same time, the output of the maximum pooling layer and the average pooling layer is input to one full-connection layer at the same time, the output of the full-connection layer is input to the other full-connection layer, and the output of the full-connection layer is output through the output layer.
CN202110763209.3A 2021-07-06 2021-07-06 Distributed characteristic data selection method in electric power spot market Active CN113435938B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110763209.3A CN113435938B (en) 2021-07-06 2021-07-06 Distributed characteristic data selection method in electric power spot market

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110763209.3A CN113435938B (en) 2021-07-06 2021-07-06 Distributed characteristic data selection method in electric power spot market

Publications (2)

Publication Number Publication Date
CN113435938A CN113435938A (en) 2021-09-24
CN113435938B true CN113435938B (en) 2023-05-16

Family

ID=77759197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110763209.3A Active CN113435938B (en) 2021-07-06 2021-07-06 Distributed characteristic data selection method in electric power spot market

Country Status (1)

Country Link
CN (1) CN113435938B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113919886A (en) * 2021-11-11 2022-01-11 重庆邮电大学 Data characteristic combination pricing method and system based on summer pril value and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135994A (en) * 2019-05-21 2019-08-16 中国科学技术大学 The transaction processing method of time-sensitive data
CN110191148A (en) * 2019-03-29 2019-08-30 中国科学院计算技术研究所 A kind of statistical function distribution execution method and system towards edge calculations

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9819558B2 (en) * 2014-03-03 2017-11-14 Microsoft Technology Licensing, Llc Streaming query resource control
US10417709B2 (en) * 2015-08-12 2019-09-17 Chicago Mercantile Exchange Inc. Mitigation of latency disparity in a transaction processing system
CN110276369B (en) * 2019-04-24 2021-07-30 武汉众邦银行股份有限公司 Feature selection method, device and equipment based on machine learning and storage medium
CN110262845B (en) * 2019-04-30 2021-05-07 北京邮电大学 Block chain enabled distributed computing task unloading method and system
CN111090507B (en) * 2019-11-25 2023-06-09 南京航空航天大学 Task scheduling method and application based on cloud edge fusion server network architecture
CN111010434B (en) * 2019-12-11 2022-05-27 重庆工程职业技术学院 Optimized task unloading method based on network delay and resource management
CN111831418A (en) * 2020-07-14 2020-10-27 华东师范大学 Big data analysis job performance optimization method based on delay scheduling technology

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110191148A (en) * 2019-03-29 2019-08-30 中国科学院计算技术研究所 A kind of statistical function distribution execution method and system towards edge calculations
CN110135994A (en) * 2019-05-21 2019-08-16 中国科学技术大学 The transaction processing method of time-sensitive data

Also Published As

Publication number Publication date
CN113435938A (en) 2021-09-24

Similar Documents

Publication Publication Date Title
Li et al. Learning from history and present: Next-item recommendation via discriminatively exploiting user behaviors
Tang Reinforcement mechanism design.
US8620729B2 (en) Methods for supply chain management incorporating uncertainty
Pei et al. Value-aware recommendation based on reinforcement profit maximization
Backus et al. Dynamic demand estimation in auction markets
Zhang et al. Finding potential lenders in P2P lending: A hybrid random walk approach
Qi et al. Data‐driven research in retail operations—A review
Xue et al. Pricing personalized bundles: A new approach and an empirical study
Zhao et al. Impression Allocation for Combating Fraud in E-commerce Via Deep Reinforcement Learning with Action Norm Penalty.
Attar et al. Forecasting contractor's deviation from the client objectives in prequalification model using support vector regression
Chhabra et al. Learning the demand curve in posted-price digital goods auctions
CN113283671B (en) Method and device for predicting replenishment quantity, computer equipment and storage medium
Ge et al. Maximizing marginal utility per dollar for economic recommendation
Zhang et al. Optimizing multiple performance metrics with deep GSP auctions for e-commerce advertising
US20230306505A1 (en) Extending finite rank deep kernel learning to forecasting over long time horizons
CN111695024A (en) Object evaluation value prediction method and system, and recommendation method and system
Pei et al. Value-aware recommendation based on reinforced profit maximization in e-commerce systems
CN113435938B (en) Distributed characteristic data selection method in electric power spot market
Ghate Optimal minimum bids and inventory scrapping in sequential, single-unit, Vickrey auctions with demand learning
Cao et al. Linear-layer-enhanced quantum long short-term memory for carbon price forecasting
Huang et al. Multi-scale interest dynamic hierarchical transformer for sequential recommendation
Jiang et al. Intertemporal pricing via nonparametric estimation: Integrating reference effects and consumer heterogeneity
Yang et al. Learning customer preferences and dynamic pricing for perishable products
US20210248576A1 (en) System to facilitate exchange of data segments between data aggregators and data consumers
Zhao et al. Bank customer churn prediction based on support vector machine: Taking a commercial bank's VIP customer churn as the example

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant