CN113919886A - Data characteristic combination pricing method and system based on summer pril value and electronic equipment - Google Patents

Data characteristic combination pricing method and system based on summer pril value and electronic equipment Download PDF

Info

Publication number
CN113919886A
CN113919886A CN202111332244.6A CN202111332244A CN113919886A CN 113919886 A CN113919886 A CN 113919886A CN 202111332244 A CN202111332244 A CN 202111332244A CN 113919886 A CN113919886 A CN 113919886A
Authority
CN
China
Prior art keywords
data
feature
characteristic
value
pricing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111332244.6A
Other languages
Chinese (zh)
Inventor
余海燕
刘珂
缪红霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202111332244.6A priority Critical patent/CN113919886A/en
Publication of CN113919886A publication Critical patent/CN113919886A/en
Priority to PCT/CN2022/126712 priority patent/WO2023082969A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0283Price estimation or determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/08Auctions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Abstract

The invention relates to machine learning, in particular to a data feature combination pricing method, a system and electronic equipment based on a Charapril value, wherein the method comprises the steps of collecting feature variables of a feature data set provided by a seller and preprocessing the feature variables; constructing a learning model based on machine learning, and selecting an optimal feature classification variable from the feature classification variables; calculating a marginal contribution and an average xiapril value of the selected characteristic variables based on a characteristic xiapril value estimate constructed by the ghost data instance; judging whether the characteristic variable can be traded or not according to the marginal contribution and the average value of the characteristic variable, and if yes, carrying out trading; the implementation of the invention can improve the long-term profit maximization of the data provider, meet the risk evaluation of the data buyer to the data buyer company and reduce the risk loss.

Description

Data characteristic combination pricing method and system based on summer pril value and electronic equipment
Technical Field
The invention relates to machine learning, in particular to a data feature combination pricing method and system based on a Charapril value and electronic equipment.
Background
The progress of data analysis brought by machine learning and data mining technology makes the value of the generated big data immeasurable, and therefore the data becomes a novel asset. Massive data can be generated in the operation process of the enterprise, and the collected data can be traded so as to increase income for the enterprise and maximize enterprise income. The data is different from the traditional commodities, has the characteristics of large amount, diversity, high speed and reproducibility, and is extremely dependent on the timeliness, the data lacking timeliness can bring about a great influence on the price of the data, and the value of the data also has uncertainty, diversity and sparseness, so that the pricing of the data is still a new problem.
For example, a certain bank analyzes various data by using financial technology, machine learning is carried out through purchased characteristic data, and prediction is carried out, so that an important tool is provided for solving the information asymmetry problem. Banking acts on a business loan process, in addition to using data about the business within the banking system, valuable external data about the business's operational capabilities are also available. And acquiring data about the enterprise by means of purchasing and the like, and analyzing the operation capacity of the enterprise by using a machine learning technology to reduce loan risks. By capturing the production and operation track of the enterprise, reliable 'credit data' is provided for financial institutions, the probability of successful loan is improved, and the transaction cost and credit service threshold are reduced.
The data transaction process is realized through a third-party data transaction platform, so that the privacy and the safety of data of a buyer can be guaranteed to a certain degree, and the reasonable price of the data buyer can be guaranteed through dynamic market pricing. The third party transaction platform needs to perform market pricing on the purchased data, and provide data and payment cost required by the transaction parties. In order to ensure the benefits of the data seller and the third-party data transaction platform, the enterprise which has successfully purchased the data needs to sign a privacy agreement with the platform, and the data is only limited to the self-operation and use of the enterprise and cannot be spread and sold for the second time.
And the third-party data transaction platform establishes a data characteristic selection model and a characteristic value distribution algorithm approximate to a Charpy value, and can judge which characteristic variables have the largest influence on the result and which have smaller influence on the result according to the obtained result. Buyers are concerned about feature sets with larger influence, and risk is controlled and loss is reduced to a certain extent through machine learning results. The bank can purchase the data to obtain the specific information of the corresponding industry, support is provided for loan assessment and analysis of the industry, and loan risks can be reduced. At the same time, the data seller can also obtain a profit.
A third-party transaction platform provides a data dynamic pricing method and system, and a feature selection algorithm based on incremental prediction accuracy is used for solving the problems of massive data features, redundancy and the like. By means of a random forest prediction algorithm, the recursive feature elimination method, the cross validation and the feature combination are combined, the data features can be effectively selected, and then the selected data features are subjected to information mining analysis. Because different data characteristics generate different contributions to the prediction, the invention provides a data characteristic contribution distribution method based on a charpy value, and the corresponding effect (marginal contribution to the prediction accuracy) of each characteristic can be calculated. And finally, dynamically pricing the monitoring characteristic data of the transaction by using an auction mode and a weighted updating algorithm. Based on the payment function of the Myerson optimal auction, a multiplicative weight updating algorithm is improved to realize dynamic pricing on data characteristics, so that the data value can be fully realized, and extra income is brought to enterprises.
Disclosure of Invention
In order to enable a third-party data transaction platform to fully utilize characteristics detected and obtained from enterprise products to realize data auction, enable a buyer to extract key information from purchased data and also obtain information about industries of data seller enterprises, the invention provides a data feature combination pricing method, a system and electronic equipment based on a Charapril value, wherein the method comprises the steps of
Collecting and preprocessing the characteristic variables of the characteristic data set provided by the seller;
constructing a learning model based on machine learning, and selecting an optimal feature classification variable from the feature classification variables;
when the optimal variable is selected, estimating a characteristic sharpril value constructed based on the ghost data example, and calculating the marginal contribution and the average sharpril value of the selected characteristic variable;
when the optimal variable is selected, the value of each feature is distributed according to the marginal contribution of the optimal variable by using a sharp value, the influence of different input features on the output prediction result of the training model is quantized, and the features which accord with the set marginal contribution are reserved;
and detecting whether the data can be used for machine learning and transaction, if the data can be used for machine learning and transaction, constructing the transaction by the data buyer and the data seller, and acquiring a predicted value of the current data as a payment price of the data through the constructed learning model.
Further, the process of selecting the optimal feature classification variables from the feature classification variables comprises the following steps:
training a learning model based on machine learning by using all the characteristic variable data;
sorting the importance of the characteristic variables, and selecting the first k characteristics with the maximum importance values;
recalculating the importance of each characteristic variable and sequencing by using a verification set evaluation model;
and splitting the training set into a new training set and a new verification set, adopting the new training set and all characteristic variable training models, using the verification set evaluation model, calculating the importance of all characteristic variables and sequencing.
Further, constructing a Shapril value estimate of the feature variable based on the ghost data instance includes randomly extracting an instance from the feature variable and constructing an instance having a feature and an instance not having the feature as an instance of the ghost data.
Further, the marginal contribution of the feature variable is expressed as:
Figure BDA0003349212720000031
wherein the content of the first and second substances,
Figure BDA0003349212720000032
the boundary contribution value of the jth feature in the example x in the mth iteration process is obtained;
Figure BDA0003349212720000033
the prediction implemented using the example with feature j in the mth iteration for example x,
Figure BDA0003349212720000034
the characteristic vector is obtained by randomly replacing the characteristics after the jth characteristic in the example x by the characteristics in the example z during the mth iteration;
Figure BDA0003349212720000035
for example x the prediction achieved in the mth iteration using the example without feature j,
Figure BDA0003349212720000036
and the characteristic vector is obtained by randomly replacing the jth characteristic in the example x and the characteristics after the jth characteristic by the characteristics in the example z during the mth iteration.
Further, the process of pricing the feature variables includes the steps of:
s41, before trading with the data buyer, the data seller sets the price p of the trading datanThe number of buyers and the price quoted by the buyers, and calculating the income function of the data buyers;
s42, calculating the final payment of the data buyer according to the income function of the buyer; the data buyer pays the fee and conducts transaction on the selected characteristic variable;
and S43, updating the data price by the seller based on the weight updating algorithm, returning to S41, and starting the next round of pricing.
Further, the data buyer pays a fee RnExpressed as:
wherein, G (b)n,pn) Setting a price of transaction data for a seller to pnAnd the buyer's quoted price is bnThe revenue function of the buyer.
Further, the income function of the seller is determined according to the price of the transaction data set by the seller and the price quoted by the buyer, and in the case that the price of the seller is fixed, when the price quoted b is fixednPrice p less than seller-set transaction datanIn time, with the quote bnIncrease the increase of the income of the buyer until the quotation bnEqual to the seller setting the price p of the transaction datanThe maximum benefit is achieved; when quoting bnPrice p greater than seller-set transaction datanThe maximum value is maintained for the buyer utility and the maximum value for the buyer payment.
Further, after pricing is determined each time, when the same data is sold to a plurality of users, pricing is performed on the data according to the data copying price, and if the data is copied into i samples, the selling price Sn of each sample is:
Figure BDA0003349212720000041
wherein S is the selling price when only one data is available, and e is a penalty factor.
The invention provides a data characteristic combination pricing system based on a Charapril value, which comprises a characteristic selection subsystem and a pricing subsystem, wherein the characteristic selection subsystem screens characteristics, and the pricing subsystem performs pricing auction on the screened characteristics;
the characteristic subsystem comprises a machine learning model and a xiapril analysis model, the machine learning model carries out training prediction according to data, values obtained through prediction are used as the importance of the characteristics to be sequenced, and K characteristics with the maximum importance are sent to the xiapril analysis model to be analyzed; calculating the editing contribution and the average sharpril value of the characteristic variables by the sharpril analysis model;
and pricing is carried out on the data buyer in the pricing subsystem according to the data seller.
The invention also provides an electronic device for pricing data characteristic combination based on the charpy value, which comprises a processor and a memory, wherein the processor is used for executing the method for pricing data characteristic combination based on the charpy value according to claim 1, and the processor is used for executing the method for pricing data characteristic combination based on the charpy value stored in the memory.
The invention has the following advantages:
1. for the feature selection problem of the prediction process, a feature selection method combining the recursive feature elimination idea based on cross validation and the feature permutation and combination is designed by considering the prediction accuracy.
2. The method can be adapted to feature selection under different prediction costs, and the input features with the maximum value are selected for the prediction model.
3. The prediction contribution distribution problem of the data information characteristics applies an approximate xiapril value method to carry out global and local interpretation on the data characteristics.
4. The designed data transaction model and the real-time dynamic pricing algorithm can realize the maximization of the long-term profit of the enterprise; meanwhile, the characteristic data obtained by auction also provides loan assessment business decision support for data buyers such as banks or insurance companies and the like, and the loss of loan and claim is reduced.
5. The third-party trading platform enables the obtained auction data to be visualized through a trading control panel, and key information is extracted quickly.
Drawings
FIG. 1 is a schematic diagram of a total architecture of feature combination dynamic pricing based on a data of a Shaapril value according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of feature selection and sorting based on a Shaapril worth of data according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a characteristic Shaapril value based on machine learning according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of data feature-based auction pricing according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a data dynamic pricing control panel according to an embodiment of the present invention;
FIG. 6 is a schematic information interaction diagram of a data feature combination dynamic pricing method based on a Charapril value according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of introducing a penalty function based on a data duplicability of a Shaapril value according to an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of a data characteristic combination dynamic pricing method device for a Shaapril value according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of another data characteristic combination dynamic pricing method device for a xiapril value according to the embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a data characteristic combination pricing method based on a spirl value, which specifically comprises the following steps:
collecting and preprocessing the characteristic variables of the characteristic data set provided by the seller;
constructing a learning model based on machine learning, and selecting an optimal feature classification variable from the feature classification variables;
calculating a marginal contribution and an average xiapril value of the selected characteristic variables based on a characteristic xiapril value estimate constructed by the ghost data instance;
the quality inspector judges whether the data can be traded or not, and if the data can be traded, the quality inspector judges whether the data can be traded or not.
The present invention provides a method for performing quality inspection on data, which is described in the embodiments of the present invention by exemplifying a data quality inspection sampling assisted real-time inspection method, and includes:
when the transaction platform annotator finishes a data annotation task, the quality inspector deletes the missing data firstly, deletes the columns (attributes) with the missing rate higher than 10%, and then deletes the rows (tuples) with the missing data; then, carrying out multi-round manual inspection on the marked data;
in the first round of manual inspection, a quality inspector performs one round of sampling inspection on the marked data, performs random sampling or layered sampling on 50% of the marked data and performs inspection, and if all the marked data in the first round are qualified, performs quality inspection on 25% of the marked data in the second round of inspection;
if more than 50% of unqualified labeled data exist in the first round, the quality inspector needs to perform full-scale inspection on the data label of the labeling operator during the second round of inspection;
if the first round has unqualified marking data of more than 10 percent and less than 50 percent, the marking data amount detected in the second round of sampling inspection is increased by 1 time compared with the first round;
if the unqualified marking data in the first round is less than 10%, the marking data amount detected in the second round of sampling inspection is increased by 30% compared with that in the first round;
if the unqualified marking data in the first round is less than 1%, data transaction can be carried out;
and repeating the above checking process until the data transaction condition is met.
In this embodiment, a learning model based on machine learning is constructed to perform a prediction task on a single instance, the prediction process is "pay", "profit" is the actual prediction of the instance minus the average predicted value of all samples, and the xiapril value of a feature is the average marginal contribution of the feature in all feature sequences, so that the contribution of each feature to the prediction result is divided fairly.
The invention relates to the estimation of the constructed characteristic sharp value, solves the problem of characteristic value distribution (contribution) based on a Shapley value, and concretely relates to the method for estimating the characteristic sharp value, wherein the difference between a prediction result of a specific example and an average predicted value of a data set is used as the characteristic sharp value (income) of the example, whether the characteristic occurs or not is simulated through two random examples, the marginal value of the characteristic in the specific example is calculated, and the average value of the absolute values of the sharp value is used as the global value of the characteristic in the data set. The data features in the experimental data set cooperate together in a machine learning algorithm to produce a predicted value. The value of each feature is distributed according to the marginal value (contribution) of the feature by using a Shapley value, the influence of different input features on the output prediction result of the training model is quantized, the distribution of the feature values balances the data prediction precision and the prediction cost, and the leaving selection of a certain feature is determined.
Example 1
The embodiment provides a data feature combination dynamic pricing method based on a charapril value, a general structural schematic diagram is shown in fig. 1, and the method comprises the following steps:
101. and associating the sensor data of the third-party data transaction platform with the related historical archives to obtain a characteristic data set. Sensor data is derived from data sets collected by data vendors using sensors throughout various segments of the production operation.
102. And carrying out feature selection and sorting on the collected feature data. Firstly, recursive feature elimination (CV-RFE) based on Cross validation is carried out, corresponding weights are respectively assigned to all the trained S features, and then a random forest prediction model is adopted to carry out training on the original data features. This step results in a weight value for each input feature. And then, taking the weight absolute value, and eliminating the features corresponding to the minimum weight absolute value, wherein the features of a plurality of weight coefficients are eliminated. And finally, performing the next round of training based on the new feature set. And (3) regarding the previous two steps as one round of training to continuously recur circularly, enabling the residual feature quantity to reach the required feature quantity after multiple rounds of training, and selecting the features by reducing the scale of the inspected feature set through recursion. Finally determining k characteristics with increasing prediction accuracy by using a cross validation method, and then combining and arranging the k characteristics generated after the CV-RFE and the incremental prediction accuracy screening to output all the subsets, thereby generating 2k-1 (removing empty set) feature subsets, then setting all feature subsets into a training set and a verification set, substituting model training and calculating feature subset accuracy respectively, taking an average value in a plurality of iterations, and finally outputting the feature subsets and corresponding accuracy.
103. A characteristic sharpril value estimate constructed based on ghost data instances. The value of the Charapril is the contribution of the feature to the prediction; the cost function is a payment function for a federation (feature) of participants. And (4) calculating the accurate value of the charpril of the ith characteristic, and evaluating the predicted values of the characteristic i contained (not contained) in all characteristic value alliances. The more features, the exponential the number of leagues grows with the increase of features, and an approximate sharp value calculation based on monte carlo sampling can solve the problem:
Figure BDA0003349212720000081
wherein the content of the first and second substances,
Figure BDA0003349212720000082
for the prediction of example x in the mth iteration using the example implementation with feature j, the feature values after feature j are replaced by the feature values of randomly sampled example z. Of x vectors
Figure BDA0003349212720000083
And
Figure BDA0003349212720000084
are approximately equal, but
Figure BDA0003349212720000085
Again from random instance samples z, both of which are combined into new samples. And in combination with the property of the data characteristic, the data characteristic value is distributed according to the prediction contribution by using the xiapril value, and the result is fair. The specific calculation procedure is shown in table 1.
TABLE 1 revenue segmentation algorithm based on approximate Charapril values
Figure BDA0003349212720000091
104. And designing a data characteristic auction mechanism, and checking whether the data is suitable for machine learning and transaction according to factors such as data quality and the like. If the transaction can be carried out, carrying out mechanism design; otherwise, no mechanism design is performed.
105. A data dynamic pricing system based on a weighted updating algorithm and a data exchange control panel. Based on the thought of multi-positive Weights and the data transaction characteristics, a pricing algorithm based on Weights updated by Weights is designed, the maximization of long-term profit of a platform is met, the generated price profit and the profit obtained by the optimal price of the future knowledge are enabled to be equal to 0, the respective effects of a buyer and a seller are maximized, a benign transaction relation is formed, and the data value is played; the data transaction control panel summarizes the obtained information such as auction prices and the like and displays the information in various visual information such as visualization.
Example 2
An important feature of the present invention is the single feature prediction contribution assignment based on machine learned feature selection and feature sharpril value estimation algorithms, as well as the global importance of the correlation trend and features. This embodiment further illustrates this.
In the model selection process (see fig. 2), after collecting feature data (201) by using a sensor and the like, dividing the feature data into a training set (202) and a verification set (209), dividing the training set into optimal feature selection under a fixed feature quantity, and obtaining an optimal feature quantity (204) by using CV-RFE feature selection (203); the optimal feature selection under the variable feature quantity is subjected to feature combination arrangement (205), and the optimal combination of the feature quantity is determined (206). And performing machine learning (207) on the optimal feature number and the optimal combination obtained in the training set to obtain a prediction model (208), and finally performing model verification (210) on the data of the verification set (209) to obtain the optimal combination of the optimal feature number and the feature number.
In the inference stage (see fig. 3), the sorted feature vectors are used in a prediction model (301) of machine learning to obtain a prediction result (302), and then feature data and the prediction result are all substituted into a sharp value analysis model (303). And finally, the global importance of the features, the correlation trend of the features and the prediction result and the prediction contribution distribution of single feature data are obtained. Based on the established model, the contribution analysis of the prediction result by using a method of the xiapril value can be divided into 2 levels, and on the global level (306, 305), the distribution of the xiapril value can be used for describing the specific influence, the rule and the correlation of the characteristics; at the local level (304), the quantized contribution of each feature in each sample prediction may be given. The cost of data collection can be balanced against the value contribution of each feature using the charpy value algorithm.
Example 3
Fig. 4 is a data characteristic auction transaction pricing mechanism of a weight updating algorithm disclosed by the embodiment of the invention. The weight updating process realizes the maximum profit of long-term operation by maintaining the weight of each pricing strategy and randomly selecting the strategy for repeated iteration. Assuming that a certain decision set comprises alpha alternative decisions corresponding to a specific profit beta (the profit is not prior), carrying out multiple rounds of selection on the alternative decisions, multiplying the current weight of each decision by a profit factor related to the current round of profit and updating the weight in each round, repeatedly making a selection by a decision party and obtaining the corresponding profit, highlighting the weight value of the strategy with the highest profit after multiple rounds, and obviously increasing the probability of selecting the strategy.
The core idea of the weighted updating algorithm is explained by taking the expert opinion prediction data auction price as an example. Assuming that the auction price trends randomly, all N experts form a set C in order to predict the status (fall or rise) of the auction price by the opinion of the experts. Before data auction, randomly selecting the suggestion of a certain expert i in C to predict the data auction trend (fall or rise), and if the expert predicts wrongly, recording the cost as 1; if the prediction is correct, the loss is 0. Because the expert i is randomly selected for prediction, in order to make a better decision, the algorithm aims to control the prediction to be close to the expert with the best performance under long-term operation, namely, the probability of selecting the expert who makes the correct prediction in the next prediction round is higher, and each round is subject to the opinions of most weighted experts by maintaining the weight of the expert group. The initial weights of N experts are all 1, and the prediction result of each round is two (drop or rise) and one is selected; a parameter eta (eta < 0.5) is introduced as a gain-related factor, and a weighted reduction penalty is given to the prediction error expert (1-eta) in the next round of selection. After T steps are selected, the algorithm has the mistake upper bound of
Figure BDA0003349212720000111
The weighting updating algorithm mainly comprises the following four steps:
the first step is as follows: data seller setting data existing price as pn(ii) a The number of data buyers is n, and the data are purchased in sequence; data buyer n quotes b for data to be purchasednFor any set of N e N]Bid by buyer bnAll from a closed, bounded set B of diameter D, D < ∞, i.e. Bn∈B。
The second step is that: the revenue function for the data buyer is G (p)n,bn) Its quote b with the buyernAnd the existing price pnIn this regard, different quotes, different present prices will result in different earnings for buyer n.
The third step: data seller based on existing price pnAnd a buyer's price bnDetermining a buyer payment function RF (p)n,bn) The function is a Lipschitz (Lipschitz) function that is used to calculate the final payment for the buyer.
Figure BDA0003349212720000112
Where L is the RippHitz coefficient, b is the buyer's quote, p(1),p(2)Two prices.
The fourth step: data buyer payment rate RnTaking away the data prediction result to complete single transaction; data seller update data price pn+1And returning to the first step, and starting the next pricing round.
When a lipposz, bounded quote is established. Let pn:n∈[N]As output of the pricing algorithm. L is the lippocitz constant of the payment function RF. The largest element of the bounded bid set B. Then by selecting the algorithm parameters:
Figure BDA0003349212720000113
the overall average regret value is bounded:
Figure BDA0003349212720000121
wherein, BmaxMaximum bid for buyer with R as set Bnet(ε) epsilon R is the minimum ε grid of B, meaning
Figure BDA0003349212720000123
For all x ∈ B, there is x0E.g., K, such that | x-x0|≤ε。BnetThe elements in (epsilon) are tested in the multiplicative weight algorithm for different prices, and N is the number of different prices.
TABLE 2 data characterization auction transaction pricing mechanism notation
Figure BDA0003349212720000122
Example 4
The invention also discloses a data dynamic pricing control panel (see fig. 5) disclosed by the embodiment. The third party data transaction platform enters relevant introductions (501) of auction data, such as industry information, attributes and the like of the data for the data buyer to view. And multiple data buyers anonymously enter the auction market to select whether to perform the auction according to the data related information (502), bid for the buyers if the auction is selected (503), and wait for the next data auction if the auction is not selected. If only one data buyer bids, the data is owned by the buyer; if multiple buyers bid, the auction is conducted according to the principle of "high price winning". Finally, one buyer bids successfully (504), the data is obtained by the buyer, and the rest buyers wait for the next data auction round. The third-party data transaction platform summarizes the transaction records (505) such as the transaction amount and the transaction amount obtained in the auction step, for example, the information such as the transaction amount, the transaction amount proportion, the buyer industry proportion display and the like is displayed in various visual forms such as imaging and the like, and the third-party data transaction platform is assisted with the decision of studying and judging the related information and the like.
Example 5
The information interaction process of the data feature combination pricing method for the xiapril value is provided in this embodiment, as shown in fig. 6, the description is made from the perspective of a data buyer, a third-party data transaction platform server, and a seller control terminal panel, and includes the following steps:
the third-party data transaction platform transmits and acquires (601) various field characteristic data in production and operation of a data seller, and then performs CV-RFE characteristic selection and sorting (603) on the acquired characteristic data to obtain characteristic data combination and sorting;
utilizing a Charpy value model (604) to obtain prediction contribution, feature and prediction result trends, feature global importance and the like for the feature data;
the third party data trading platform dynamically prices the auction for trading (605), and the data seller determines whether to purchase the data based on the value of the auction data and whether the business revenue can be increased, and if so, the data seller can participate in the auction, and if not, the data seller can not participate in the auction. Collecting auction prices by the third-party data transaction platform, and judging abnormal auction prices such as too low and too high (606);
the auction price data information is summarized (607) and sent (608) to the control panel terminal, which issues the result information (609).
Due to the reproducibility of the data, the data copying process can dilute (is characterized by a penalty function) the income of each data source, and the overall income distribution Sn (0)<SnLess than or equal to 1) is constant, Sn=G(Pn,bn) Revenue for the data buyer. Introduce a penalty function e (0)<e<1) The data replication problem is solved by using a robust sharp value algorithm (see fig. 7). Given Rn=PD(Sn,Yn(ii) a M, G), wherein SnFor total revenue distribution, YnIs a prediction task, M is a machine learning prediction algorithm; g is the prediction gain function and PD is the robust Charpy value algorithm for revenue (Table 3). Given the similarity metric SM, the ith copy of A'iThe payment allocation function of A is Rn(A) (ii) a R output by the Raapril value Algorithm (Table 3) for the values of Charpyn(A) Is the robustness benefit of epsilon-replication.
TABLE 3 robust Charapril value algorithm
Figure BDA0003349212720000131
Figure BDA0003349212720000141
Referring to FIG. 7, the market data set is illustrated with no copies, 1 copy, and 2 copies. When the dataset has no copybooks (701), the overall revenue allocation is Sn(ii) a When 1 copy is available on the market (702), and the penalty function e is introduced for 2 data sets on the market in total, the profit allocation of each data set is 1/2Se respectively; when there are 2 copies in the market (703), there are 3 sets of data in the market in total, and the penalty function becomes e2Then the profit allocation per data set is 1/3Se respectively2And so on. The more copies of a data set in the market, the less revenue is allocated. After the pricing S is determined each time, when the same data is sold to a plurality of users, pricing is carried out on the data according to the data copying price, and if the data is copied into i samples, the selling price S of each samplenComprises the following steps:
Figure BDA0003349212720000142
wherein S is the selling price of only one copy of data, which is different from the quoted price bnPrice p with seller setting datanS is the actual selling price of the selling price when only one data is available; e is a penalty factor.
Example 6
The embodiment of the invention discloses a structural schematic diagram of a data characteristic combination dynamic pricing method device of a sharpril value. The data testing device may be an electronic device. The data testing apparatus may include: the processor 801 transmits effective information, the memory 802 is responsible for storing data such as characteristic data and the like, after characteristic selection and sorting are carried out on enterprise production and management data (803), auction is carried out by utilizing data analyzed by a Charpy value, a result is transmitted to a control panel terminal, the communication interface 803 refers to an interface between the central processing unit and a standard communication subsystem, and the control panel 804 carries out picture display.
Referring to fig. 9, a schematic structural diagram of another data characteristic combination dynamic pricing method device for a xiapril value disclosed in the embodiment of the present invention is shown. Wherein, the pricing device can be an electronic device. The three algorithms in the invention are carried out according to the following steps:
feature selection and sorting: the device can utilize data obtained by sensor and file cascade connection to pass through the obtaining unit 901 and be sent to the calculating unit for analysis, after the calculating unit 902 receives signals, the processes of prediction, sequencing and the like are carried out on the characteristic data through the control unit 903, then the storage unit 904 stores the predicted and sequenced characteristic data, and after the storage unit finishes working, the result is transmitted to the output unit 905.
Constructing a characteristic sharpril value estimate for the ghost data instance: the obtained feature combination sorting data is input 901, is sent to a calculation unit 902 for analysis, and is designed and constructed by a control unit 903 to calculate the marginal contribution, the mean value and the like of each instance, and then the result is stored and output.
Data dynamic pricing of the power updating algorithm: the existing price, the number of buyers and the quotation of the seller setting data are input to the acquisition unit 901, the buyer payment fee is calculated through the calculation unit 902 and the control unit 903 according to the income function and the payment function of the buyer, and the storage unit 804 stores the related data and outputs the updated data price of the next round of seller.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A data characteristic combination pricing method based on a Charapril value is characterized by comprising the following steps:
collecting and preprocessing the characteristic variables of the characteristic data set provided by the seller;
constructing a learning model based on machine learning, and selecting an optimal feature classification variable from the feature classification variables;
when the optimal variable is selected, estimating a characteristic sharpril value constructed based on the ghost data example, and calculating the marginal contribution and the average sharpril value of the selected characteristic variable;
when the optimal variable is selected, the value of each feature is distributed according to the marginal contribution of the optimal variable by using a sharp value, the influence of different input features on the output prediction result of the training model is quantized, and the features which accord with the set marginal contribution are reserved;
and detecting whether the data can be used for machine learning and transaction, if the data can be used for machine learning and transaction, constructing the transaction by the data buyer and the data seller, and acquiring a predicted value of the current data as a payment price of the data through the constructed learning model.
2. The data feature combination pricing method based on the charpy value as claimed in claim 1, wherein the process of selecting the optimal feature classification individual variable from the feature classification individual variables comprises the steps of:
training a learning model based on machine learning by using all the characteristic variable data;
sorting the importance of the characteristic variables, and selecting the first k characteristics with the maximum importance values;
recalculating the importance of each characteristic variable and sequencing by using a verification set evaluation model;
and splitting the training set into a new training set and a new verification set, adopting the new training set and all characteristic variable training models, using the verification set evaluation model, calculating the importance of all characteristic variables and sequencing.
3. A data feature combination pricing method based on a charpy value according to claim 1, characterized in that the charpy value estimation of the feature variables constructed based on the ghost data instance comprises randomly extracting an instance from the feature variables and constructing an instance containing a feature and an instance not containing the previous feature, and using the two instances as ghost data instances.
4. A data feature combination pricing method based on the charpy value, according to claim 1, characterized in that the marginal contribution of the feature variables is expressed as:
Figure FDA0003349212710000021
wherein the content of the first and second substances,
Figure FDA0003349212710000022
the boundary contribution value of the jth feature in the example x in the mth iteration process is obtained;
Figure FDA0003349212710000023
the prediction implemented using the example with feature j in the mth iteration for example x,
Figure FDA0003349212710000024
the characteristic vector is obtained by randomly replacing the characteristics after the jth characteristic in the example x by the characteristics in the example z during the mth iteration;
Figure FDA0003349212710000025
for example x the prediction achieved in the mth iteration using the example without feature j,
Figure FDA0003349212710000026
and the characteristic vector is obtained by randomly replacing the jth characteristic in the example x and the characteristics after the jth characteristic by the characteristics in the example z during the mth iteration.
5. The data feature combination pricing method based on the charpy value as claimed in claim 1, wherein the process of pricing the feature variable comprises the steps of:
s41, before trading with the data buyer, the data seller sets the price p of the trading datanThe number of buyers and the price quoted by the buyers, and calculating the income function of the data buyers;
s42, calculating the final payment of the data buyer according to the income function of the buyer; the data buyer pays the fee and conducts transaction on the selected characteristic variable;
and S43, updating the data price by the seller based on the weight updating algorithm, returning to S41, and starting the next round of pricing.
6. The data feature combination pricing method based on the charpy value of claim 1, wherein a data buyer pays a fee RnExpressed as:
Figure FDA0003349212710000027
wherein, G (b)n,pn) Setting a price of transaction data for a seller to pnAnd the buyer's quoted price is bnThe revenue function of the buyer.
7. The data feature combined pricing method based on the charpy value of claim 6, wherein the income function of the seller is determined according to the price of the transaction data set by the seller and the price quoted by the buyer, and when the price quoted b is fixed by the sellernPrice p less than seller-set transaction datanIn time, with the quote bnIncrease the increase of the income of the buyer until the quotation bnEqual to the seller setting the price p of the transaction datanThe maximum benefit is achieved; when quoting bnPrice p greater than seller-set transaction datanThe maximum value is maintained for the buyer utility and the maximum value for the buyer payment.
8. The data feature combination pricing method based on the charapril value as claimed in claim 5, wherein each time the pricing S is determined and the same data is sold to a plurality of users, the pricing of the data is performed according to the data copying price, and if the data is copied into i samples, the selling price S of each sample is determinednComprises the following steps:
Figure FDA0003349212710000031
wherein S is the selling price when only one data is available, and e is a penalty factor.
9. The data characteristic combination pricing system based on the Charapril value is characterized by comprising a characteristic selection subsystem and a pricing subsystem, wherein the characteristic selection subsystem filters characteristics, and the pricing subsystem performs pricing auction on the filtered characteristics;
the characteristic subsystem comprises a machine learning model and a xiapril analysis model, the machine learning model carries out training prediction according to data, values obtained through prediction are used as the importance of the characteristics to be sequenced, and K characteristics with the maximum importance are sent to the xiapril analysis model to be analyzed; calculating the editing contribution and the average sharpril value of the characteristic variables by the sharpril analysis model;
and pricing is carried out on the data buyer in the pricing subsystem according to the data seller.
10. Electronic device for pricing data characteristic combination based on the charpy value, comprising a processor and a memory, wherein the memory stores any one of the method for pricing data characteristic combination based on the charpy value according to claim 1, and the processor is capable of executing the method for pricing data characteristic combination based on the charpy value stored in the memory.
CN202111332244.6A 2021-11-11 2021-11-11 Data characteristic combination pricing method and system based on summer pril value and electronic equipment Pending CN113919886A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111332244.6A CN113919886A (en) 2021-11-11 2021-11-11 Data characteristic combination pricing method and system based on summer pril value and electronic equipment
PCT/CN2022/126712 WO2023082969A1 (en) 2021-11-11 2022-10-21 Data feature combination pricing method and system based on shapley value and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111332244.6A CN113919886A (en) 2021-11-11 2021-11-11 Data characteristic combination pricing method and system based on summer pril value and electronic equipment

Publications (1)

Publication Number Publication Date
CN113919886A true CN113919886A (en) 2022-01-11

Family

ID=79246015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111332244.6A Pending CN113919886A (en) 2021-11-11 2021-11-11 Data characteristic combination pricing method and system based on summer pril value and electronic equipment

Country Status (2)

Country Link
CN (1) CN113919886A (en)
WO (1) WO2023082969A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114780742A (en) * 2022-04-19 2022-07-22 中国水利水电科学研究院 Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area
CN114997549A (en) * 2022-08-08 2022-09-02 阿里巴巴(中国)有限公司 Interpretation method, device and equipment of black box model
CN115116240A (en) * 2022-06-27 2022-09-27 中国科学院电工研究所 Lantern-free intersection vehicle cooperative control method and system
WO2023082969A1 (en) * 2021-11-11 2023-05-19 重庆邮电大学 Data feature combination pricing method and system based on shapley value and electronic device
WO2023193703A1 (en) * 2022-04-04 2023-10-12 Huawei Cloud Computing Technologies Co., Ltd. Systems, methods, and computer-readable media for secure and private data valuation and transfer

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7512558B1 (en) * 2000-05-03 2009-03-31 Quantum Leap Research, Inc. Automated method and system for facilitating market transactions
CN111028080A (en) * 2019-12-09 2020-04-17 北京理工大学 Multi-arm slot machine and Shapley value-based crowd sensing data dynamic transaction method
CN111325353A (en) * 2020-02-28 2020-06-23 深圳前海微众银行股份有限公司 Method, device, equipment and storage medium for calculating contribution of training data set
CN113159835B (en) * 2021-04-07 2023-02-28 远光软件股份有限公司 Power generation side electricity price quotation method and device based on artificial intelligence, storage medium and electronic equipment
CN113435938B (en) * 2021-07-06 2023-05-16 牡丹江大学 Distributed characteristic data selection method in electric power spot market
CN113919886A (en) * 2021-11-11 2022-01-11 重庆邮电大学 Data characteristic combination pricing method and system based on summer pril value and electronic equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023082969A1 (en) * 2021-11-11 2023-05-19 重庆邮电大学 Data feature combination pricing method and system based on shapley value and electronic device
WO2023193703A1 (en) * 2022-04-04 2023-10-12 Huawei Cloud Computing Technologies Co., Ltd. Systems, methods, and computer-readable media for secure and private data valuation and transfer
CN114780742A (en) * 2022-04-19 2022-07-22 中国水利水电科学研究院 Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area
CN114780742B (en) * 2022-04-19 2023-02-24 中国水利水电科学研究院 Construction and use method of flow scheduling knowledge-graph question-answering system of irrigation area
CN115116240A (en) * 2022-06-27 2022-09-27 中国科学院电工研究所 Lantern-free intersection vehicle cooperative control method and system
CN114997549A (en) * 2022-08-08 2022-09-02 阿里巴巴(中国)有限公司 Interpretation method, device and equipment of black box model
CN114997549B (en) * 2022-08-08 2022-10-28 阿里巴巴(中国)有限公司 Interpretation method, device and equipment of black box model

Also Published As

Publication number Publication date
WO2023082969A1 (en) 2023-05-19

Similar Documents

Publication Publication Date Title
CN113919886A (en) Data characteristic combination pricing method and system based on summer pril value and electronic equipment
US7337135B1 (en) Asset price forecasting
TWI248001B (en) Methods and apparatus for automated underwriting of segmentable portfolio assets
TW530235B (en) Valuation prediction models in situations with missing inputs
TW530234B (en) Methods and systems for efficiently sampling portfolios for optimal underwriting
KR100746107B1 (en) Cross correlation tool for automated portfolio descriptive statistics
KR20010102452A (en) Methods and systems for finding value and reducing risk
EP1259917A1 (en) Methods and apparatus for simulating competitive bidding yield
KR20010102455A (en) Rapid valuation of portfolios of assets such as financial instruments
EP1264242A1 (en) Methods and systems for automated inferred valuation of credit scoring
EP1269348A1 (en) Methods and apparatus for rapid deployment of a valuation system
EP1194875A2 (en) Methods and systems for optimizing return and present value
EP1266318A2 (en) Methods and systems for quantifying cash flow recovery and risk
CN112598500A (en) Credit processing method and system for non-limit client
CN111090833A (en) Data processing method, system and related equipment
CN113642923A (en) Bad asset pack value evaluation method based on historical collection urging data
CN111626855A (en) Bond credit interest difference prediction method and system
Fransisca et al. Effect of PSAK 72 Implementation in Property and Real Estate’s Financial Health
CN114240598A (en) Credit line model generation method, credit line determination method and device
CN113298642A (en) Order detection method and device, electronic equipment and storage medium
CN112116169B (en) User behavior determining method and device and electronic equipment
CN115953231A (en) Method and device for determining equity pricing interest rate and terminal equipment
RISMA THE EFFECT OF EARNING PER SHARE (EPS), NET PROFIT MARGIN (NPM) AND RETURN ON ASSET (ROA) ON STOCK PRICE IN BANKING COMPANIES LISTED ON THE INDONESIA STOCK EXCHANGE FOR THE 2018-2020 PERIOD
CN116091200A (en) Scene credit granting system and method based on machine learning, electronic equipment and medium
CN117196897A (en) Intellectual property value evaluation and loan method and device, storage and processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination